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^This  thesis  compares  three  estimation  techniques  in 
application  to  the  beta  distribution:  method  of  moments, 
maximum  likelihood,  and  minimum  distance.  The  four  para¬ 
meter  version  of  the  beta  distribution  is  used;  it  has  two 
shape  parameters,  and  upper  and  lower  limit  parameters. 


initial  estimates  of  the  limits.  The  classical  estimation 
procedures,  method  of  moments  and  maximum  likelihood,  are 
applied  through  procedures  found  in  the  literature.  A  newer 
technique,  minimum  distance,  is  applied  for  the  first  time 
to  the  beta  distribution. 

Comparision  of  estimation  techniques  is  accomplished 
using  Monte  Carlo  analysis.  Five  sample  sizes  are 
considered  --  4,  8,  12,  16,  and  20  —  and  three  pairs  of 
shape  parameters  —  (3,3),  (9,4),  and  (1,2)  —  for  a  total 
of  fifteen  cases.  One  thousand  samples  are  generated  for 
each  case,  and  each  estimation  technique  is  then  applied  to 


all  samples...  Two  effectiveness  measures  are  used;  they  are 


ror  of  each  parameter  estimate,  and  the 


Cramer-von  MisesNdistance  between  the  estimated  and  the  true 


distribution.  Thesb  effectiveness  measures  are  compared  in 


overall  effectiveness 


COMPARISON  OF  ESTIMATION  TECHNIQUES 
FOR  THE  FOUR  PARAMETER  BETA  DISTRIBUTION 

I .  Introdnct ion 

Statistical  estimation  is  currently  used  in  private 
industry,  in  government,  and  in  the  military.  Areas  of 
application  include  quality  control,  logistics,  and  simula¬ 
tion.  As  estimation  theory  has  been  studied  over  time, 
different  techniques  have  been  developed  for  finding  esti¬ 
mates  of  the  parameters  of  probability  distributions  based 
on  a  sample  from  that  distribution.  Therefore,  there  is  a 
need  to  perform  a  comparison  of  estimation  methods  for 
specific  distributions  and  determine  whether  any  one  method 
ont-performs  the  others.  The  research  performed  for  this 
thesis  undertakes  such  a  comparison  for  the  four-parameter 
beta  distribution;  the  estimation  methods  compared  are  the 
method  of  moments,  maximum  likelihood,  and  minimum  distance. 

The  following  hypothetical  situation  illustrates  how 
the  results  of  this  thesis  might  be  used.  A  large  interna¬ 
tional  conglomerate,  known  as  the  Bertrand  Corporation, 
produces  a  highly  complex,  technologically  sophisticated 
piece  of  equipment  called  the  widget.  The  president  of  the 
corporation,  a  very  wise  and  knowledgeable  man,  realizes 
that  his  customers  will  require  information  on  how  long 
their  widgeta  will  last.  He  therefore  would  like  to  know 
the  probability  distribution  of  the  time  to  failure  (TTF)  of 
his  product. 

The  desired  information  could  be  obtained  in  a  number 
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of  ways.  First,  every  widget  produced  could  be  operated 
until  it  failed,  and  the  length  of  operation  recorded.  This 
technique  would  provide  perfect  information  on  the  TTF  of 
each  item;  however,  Bertrand  Corporation  stock  could  be 
expected  to  drop  drastically  due  to  lack  of  sales. 

A  less  costly  method  would  be  to  start  by  assuming  that 
the  TTF  is  normally  distributed.  Then,  a  random  sample  of 
the  widgets  could  be  taken  from  those  produced,  and  run  to 
failure.  The  mean  and  variance  of  this  sample  could  then  be 
used  as  estimates  of  the  mean  and  variance  of  the  underlying 
normal  distribution.  The  difficulty  inherent  in  this  method 
is  that  the  normality  assumption  may  not  be  valid.  If  the 
analyst  has  no  idea  of  the  shape  of  the  underlying  distribu¬ 
tion,  he  or  she  cannot  be  sure  whether  or  not  a  normal  curve 
can  be  made  to  fit  it  with  reasonable  accuracy. 

The  third  method  is  to  assume  as  the  underlying  distri¬ 
bution  one  which  can  take  on  a  large  variety  of  shapes,  and 
then  take  a  random  sample  of  widgets  from  which  to  find 
estimates  of  the  parameters  of  this  distribution.  The  pres¬ 
ident  of  Bertrand  Corporation  knows  that  the  beta  dis¬ 
tribution  can  take  on  many  shapes,  but  also  realizes  that 
there  are  several  methods  available  to  perform  the  estima¬ 
tion.  He  therefore  would  like  to  know  what  method  will  give 
him  the  most  accurate  information  on  the  time  to  failure  of 
the  widgets,  so  that  he  can  pass  this  information  on  to  his 
customers . 

This  thesis  is  undertaken  in  order  to  provide  the 
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aforementioned  president,  or  anyone  in  a  similar  sitnation, 
the  answer  that  he  or  she  requires.  The  two  classical 
methods  of  estimation,  method  of  moments  and  maximum  likeli¬ 
hood,  and  a  more  recent  technique  called  minimum  distance 
estimation,  will  be  used  to  estimate  the  parameters  of  the 
four  parameter  beta  distribution.  The  techniques  will  then 
be  compared  with  each  other,  to  determine  if  any  one  method 
provides  superior  results.  This  thesis  report  will  proceed 
in  four  parts.  First,  the  three  estimation  techniques  will 
be  reviewed  in  general.  Second,  the  beta  distribution  will 
be  described  and  application  of  the  estimation  methods  to  it 
will  be  discussed.  Third  will  be  a  summary  of  the  Monte 
Carlo  analysis  performed  in  order  to  evaluate  the  techniques 
and  make  the  desired  comparisons.  Last,  the  results  and 
conclusions  of  this  thesis  will  be  presented  along  with 
suggestions  for  future  work  on  the  subject. 
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1 1 .  E» t ima t ion  Techniques 


In  the  introduction,  it  was  stated  that  this  thesis 
would  compare  method  of  moments,  maximum  likelihood,  and 
minimum  distance  estimation  of  the  beta  distribution.  This 
chapter  will  provide  some  background  and  general  theory  on 
these  three  estimation  techniques.  First,  however,  a  few 
words  about  estimation  in  general  are  in  order.  Estimation 
involves  finding  approximations,  or  estimates,  of  the  para¬ 
meters  of  a  probabililty  distribution  through  the  use  of 
estimators.  Mendenhall  and  Scheaffer  define  an  estimator  as 
"a  rule  that  tells  us  how  to  calculate  an  estimate  based  on 
the  measurements  contained  in  a  sample"  (Eef  19:264).  The 
estimation  process,  then,  is  one  of  taking  a  sample  from  the 
population  of  interest,  performing  calculations  on  the 
sample  points  according  to  the  "rule"  of  the  estimator, 
and  using  the  results  of  these  calculations  as  the  estima¬ 
tors  of  the  parameters  of  the  underlying  distribution. 

Next,  this  chapter  will  consider  in  detail  three 
specific  rules,  or  techniques,  used  for  estimation.  The 
first  two  are  known  as  the  classical  estimation  techniques; 
the  third  is  a  more  recent  method. 

Method  of  Moments 

"The  method  of  moments  is  one  of  the  oldest  estimation 
techniques'*  (Ref  4:7).  The  basic  idea  is  fairly  simple; 
use  as  estimators  those  values  of  the  parameters  fox  which 
sample  moments  equal  population  moments  (Ref  19:300).  The 
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k t h  sample  moment  is  computed  from  the  sample  as  follows 
(Ref  19:300) : 


m '  *  -)x*  (2.1) 

where  n  is  the  sample  size  and  X^  is  the  ith  sample  point. 
The  kth  population  moment  is  derived  from  the  underlying 
distribution  using  the  formula  (Ref  19:300): 


=•  E(Xk) 


(2.2) 


The  population  moment  will  therefore  be  a  function  of  the 
parameters  of  the  underlying  distribution.  Let  p  be  the 
number  of  parameters  to  be  estimated.  Setting  up  the 
equa  t ions 


*  m£  k»l,..., p  (2.3) 

provides  a  system  of  p  equations  in  the  p  unknown 
parameters,  which  may  be  solved  to  find  the  method  of  mom¬ 
ents  estimators  of  the  parameters. 

It  is  sometimes  convenient  to  use  functions  of  the 
moments,  rather  than  the  moments  themselves,  when  performing 
method  of  moments  estimation.  This  can  be  done,  provided 
the  total  number  of  different  moments  used  is  still  equal  to 
the  number  of  parameters  being  estimated.  Two  such  func¬ 


tions  which  are  commonly  used  are  the  skewness  and  the 


kurtosis.  Before  defining  these,  it  is  necessary  to  define 
a  different  class  of  moments,  known  as  the  central  moments. 
The  kth  central  moment,  or  moment  abont  the  mean,  of  the 
sample  is  (Ref  18:18): 


"k 


(2.4) 


where  X  is  the  sample  mean  (X  **  m'  *  n  .  ).  The  kth  popula¬ 
tion  central  moment  is  (Ref  18:18): 


,ik  -  E (  (X  -  p)k) 


(2.5) 


where  p  =  E ( jj)  is  the  mean  of  the  distribution.  The  central 

A 

moments  are  used  in  finding  the  sample  skewness,  E ^ ,  and 
population  skewness,  ,  as  follows  (Ref  18:20-21): 


B,3/<n2) 


3/2 


(2.6) 


x3  *»  p3/(p2)3/2 


(2.7) 


The  sample  and  population  kurtoses  are  derived  from  the 
following  formulas,  the  again  indicating  the  sample 
statistic  (Ref  18:20-21): 


K4  -  m4/(m2)2  (2.8) 

C4  -  p4/(l»2*  (2.9) 
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Since  the  first  and  second  moients,  skewness,  and  kurtosis 
together  involve  only  the  first  four  moments,  they  can  be 
used  to  find  moment  estimates  for  four  parameters. 

Maximum  Like  1 ihood 

The  method  of  maximum  likelihood  uses  as  estimators  the 
parameter  values  which  maximize  the  likelihood,  or  joint 
density,  of  the  sample  (Ref  19:303).  Let  f(y;0)  be  the 
underlying  probability  distribution,  where  0  =  (0,  ,0.  ,..,0  ) 

12  p 

is  a  vector  of  parameter  values.  Since  the  sample  is  chosen 
at  random,  the  joint  density  of  the  sample  is  merely  the 
product  of  the  probability  distribution  evaluated  at  each  of 
the  sample  points  (Ref  19:171);  the  likelihood  is  therefore 
defined  by  the  formula. 


n 

LU1,...,Xn;0)  -  J]*f(X.;0)  (2.10) 


The  goal  is  to  find  the  vector  0  which  maximizes  this 
function.  This  can  be  done  by  setting  each  of  the  p  partial 
derivatives  3L/  30^.  k*l,..,p  to  zero  and  solving  for  the 

A 

0^.  In  practice,  it  is  usually  expedient  to  take  the 
natural  logarithm  of  L  before  maximizing;  this  transforms 
the  product  into  a  sum,  which  is  easier  to  differentiate. 

A 

Maximizing  ln(L)  will  result  in  the  same  values  for  0^, 


function  (Ref  19:303).  Therefore,  the  maximum  likelihood 


estimators  are  the  roots  of  the  p  simultaneous  equations 


L(  ^  , ... ,  x  q;01  , .. »  0  p)  -  0  ,  k  =  l...,p  (2.11) 

In  some  cases,  the  estimators  cannot  be  solved  for  in  closed 
form  and  must  be  found  by  iteration  (Ref  4:6). 

Minimum  Distance 

Minimum  distance  estimation  has  been  developed  more 
recently  than  the  previous  two  methods;  its  development  took 
place  in  the  1950's.  This  method  evolved  from  attempts  by 
estimation  theorists  to  strike  a  balance  between  the  charac¬ 
teristics  of  robustness  and  consistency.  The  term  robust¬ 
ness  refers  to  the  ability  of  an  estimator  to  adapt  to 
deviations  in  the  underlying  model  and  remain  efficient  (Ref 
21:3).  An  estimator  is  consistent  if  it  converges  in  proba¬ 
bility  to  the  true  value  of  the  parameter  as  the  sample  size 
tends  to  infinity  (Ref  19:309).  The  initial  work  in  minimum 
distance  estimation  was  performed  by  J.  Wolfowitz.  He 
published  a  paper  in  1953  (Ref  26),  and  another  in  1957 
(Ref  27)  which  outlined  the  minimum  distance  method  and 
showed  it  to  be  consistent;  "in  a  wide  variety  of  cases, 
[the  minimum  distance  method!  will  furnish  super-consistent 
estimators  even  when  classical  me thods...f a  il  to  give  con¬ 
sistent  estimators"  (Ref  26:9). 

It  has  not  been  until  recently,  however,  that  minimum 
distance  estimation  has  begun  to  be  widely  applied.  In 
their  1979  paper,  Parr  and  Schucany  applied  the  method  to 
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estimation  of  the  location  parameter  of  a  symmetric  distri¬ 
bution,  emphasizing  the  normal  distribution,  and  found  it  to 
yield  "strongly  consistent  estimators  with  excellent  ro¬ 
bustness  properties"  (Ref  21:5).  Other  applications  have 
been  accomplished  at  the  Air  Force  Institute  of  Techology 
(AFIT)  under  the  supervision  of  Dr.  Albert  H.  Moore.  They 
are  estimation  of  the  location  parameters  of  the  generalized 
exponential  power  distribution  by  Maj.  Larry  McNeese  (Ref 
17),  estimation  of  the  parameters  of  the  generalized  t 
distribution,  by  Capt.  Tony  Daniels  (Ref  4),  estimation  of 
the  three  parameter  weibull  distribution,  by  Capt.  Robert 
Miller  (Ref  20),  and  estimation  of  the  three  parameter  gamma 
distribution,  by  Capt.  William  L.  James  (Ref  13).  These 
studies  generally  found  the  minimum  distance  estimators  to 
be  better  than  the  classical  methods.  This  thesis  is  a 
continuation  of  these  efforts. 

The  minimum  distance  method  is  an  extention  of  the 
goodness  of  fit  tests  used  in  testing  hypotheses.  To  test, 
by  goodness  of  fit,  the  hypothesis  that  a  sample  is  from  a 
certain  distribution  with  certain  parameter  values,  one 
constructs  the  distribution  function  F(x;g)  at  these  para¬ 
meter  values,  and  determines  how  well  it  fits  the  distribu¬ 
tion  function  of  the  sample  (called  the  empirical  distribu¬ 
tion  function,  or  EDF)  by  some  previously  defined  measure  of 
fit.  Commonly,  the  goodness  of  fit  measure  is  some  measure 
of  the  distance  between  F(x;g)  and  the  EDF.  Minimum 
distance  estimation  merely  takes  as  its  estimate  of  £  those 
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values  which  minimize  the  distance  between  F(x;g)  and  Sn(z). 

Minimum  distance  estimation  requires  that  three  things 
be  specified:  the  family  of  distribution  functions  F(X;g), 
a  rule  for  obtaining  the  empirioal  distribution  function, 
denoted  Sn(z) ,  and  a  measure  of  the  distance  between  F(x;£) 
and  Sn(x).  Previous  applications  to  other  families  of  dis¬ 
tribution  have  already  been  mentioned;  this  thesis  deals 
with  the  beta  family  of  distributions.  There  are  several 
EDF's  which  can  be  used,  among  them  the  1/n  step  function 
and  median  ranks.  The  distance  measure  is  of  crucial  impor¬ 
tance;  often,  distance  estimators  are  identified  by  the  name 
of  the  distance  measure  us*.d.  They  will  be  described  in 
more  detail. 

The  most  common  distance  measures  are  descibed  in  Parr 
and  Schucany;  the  following  definitions  are  from  this  paper 
(Sef  21:7-8).  The  first  is  the  weighted  Kolmogorov  distance 

Dg ( Sn, F)  -  *"j|sn(x)-F(x;0) |g(F(x;0) )  (2.12) 

where  'sup*  signifies  the  least  upper  bound  and  g  is  a 
weighting  function.  This  statistic  should  be  familiar  to 
those  experienced  with  the  Eolmogorov-Smirnov  goodness  of 
fit  test  (Kef  3:347).  Another  measure  developed  from  a 
goodness  of  fit  statistie  is  the  weighted  Crsmer-von  Mises 
distance 

W*(Sn,F)  -  5(Sn(x)-F(x,$) )2g(F(x.f) )dF(x.*>  (2.13) 


i 
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where,  ijain,  (  it  i  weighting  function.  Use  of  uniform 

weighting,  defines  the  unweighted  Cramer-von  Mises 

2  1 
(CVM)  measure  W  (Sn,F).  The  weighting  scheme  5(F)  «  gr---r-, 

r  \  1*“F  ) 

0<F<1  defines  the  Anderson-Darling  distance  measure,  which 
2 

is  denoted  A  (Sn,F).  Sniper's  maximal  interval  probability 
distance  is  given  by  the  equation. 


V(Sn.F)  “  -I<I<b<«| (Sn(b)-Sn(a))  -  ( F (b ;fi) -F ( a ; 0) ) j  (2.14) 
Last,  a  general  class  of  distance  measures  is  defined  by 


Z.  (Sn,F)  « 

aJ(Sn(X)-F(X;0) )2dF(X;0)+b[J(Sn<X)-F(X;0) )dF(X;0)]2  (2.15) 


This  class  includes  the  CVM  measure  when  a  =  0  and  b»l, 

2 

Watson's  measure,  denoted  U  (Sn,F)  when  a  =  1  and  b*l,  and 
Chapman's  measure,  when  a - 0  and  b*l .  The  previous  AFIT 
studies  which  were  mentioned  have  used  the  Kolmogrov, 
Cramer-von  Mises,  and  Anderson-Darling  distance  measures. 
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III.  The  Beta  Distribution 


The  beta  distribution  is  becoming  more  and  more 
widely  used  in  applied  statistical  analysis  in 
many  business  disciplines.  For  example,  in  finanee 
the  beta  distribution  has  been  employed  in  an 
attempt  to  measure  the  probability  of  payment  or 
default  in  a  credit  granting  decision.  In  manage¬ 
ment,  the  beta  distribution  is  often  used  in  PERT. 
And  in  marketing,  the  beta  distribution  is  fre¬ 
quently  employed  in  Markovian  b r and- s w i t ch i ng 
models  when  transition  probabilities  are  taken  to 
be  random  variables  rather  than  parameters  (Ref 
8:1)  . 


These  are  just  a  few  ways  in  which  the  beta  distribution 
is  applied  in  the  "real  world."  However,  work  on  estima¬ 
tion  of  the  fonr  parameter  beta,  with  all  four  parameters 
unknown,  is  scarce;  in  preparation  for  this  thesis,  only  one 
paper — by  Glenn  E.  Tarr  (Ref  24) — was  found  which  dealt  with 
estimation  of  all  four  parameters.  Virtually  all  of  the  work 
done  on  estimation  of  the  beta  that  this  author  encountered 


either  dealt  only  with  the  two  parameter  version  or  assumed 
that  the  other  two  parameters  were  known  constants. 

This  chapter  deals  with  the  four  parameter  beta  distri¬ 
bution:  what  it  is,  and  how  the  three  estimation  techniques 
described  in  Chapter  II  may  be  applied  to  it.  The  first 
section  of  this  chapter  defines  the  beta  family  of  proba¬ 
bility  distributions  and  reviews  some  of  its  character¬ 
istics.  The  second  section  descibes  the  method  used  to  get 
preliminary  estimates  for  two  of  the  parameters,  which  are 
required  in  applying  the  three  estimation  procedures  in 
Chapter  II.  The  final  three  sections  of  this  chapter  consi¬ 
der  how  the  method  of  moments,  maximum  likelihood,  and 
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minimum  distance  are  applied  to  the  beta  distribution. 


General 

This  thesis  applies  the  three  estimation  techniques 
explained  in  the  previous  chapter  to  the  beta  family  of 
probability  distributions.  The  most  general  form  of  the 
probability  density  function  (pdf)  is  referred  to  as  the  four 
parameter  beta  distribution,  and  has  the  form  (Ref  14:37): 


f (y;P,Q, A,B) 


1 

bTpTq) 


(Y-A)1*'1  (B-Y)Q-1/  (B-A)P+?-1  AIY<B 

(3.1) 


0  ,  otherwise 


The  parameters  A  and  B  define  the  range  over  which  the 
function  is  defined;  hence,  any  random  variate  from  this 
distribution  must  fall  between  A  and  B.  The  parameters  P 
and  Q  are  known  as  the  shape  parameters,  since  they  deter¬ 
mine  the  shape  of  the  graph  of  the  probability  density 
function.  Both  P  and  Q  are  required  to  be  strictly  greater 
than  zero,  and  B  must  be  strictly  greater  than  A. 
Exchanging  the  values  of  P  and  Q  cause  the  graph  to  be 
reflected  about  the  midpoint  of  the  line  segment  AB.  The 
function  B(P,Q)  is  defined  by  the  formula  (Ref  19:130): 

B(P.G)  -  |YP"1(l-Y)Q_1dY  -  (3.2) 


The  speoial  case  where  A«0  and  B”1  is  oommonly  known  as 
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the  standard  or  two  parameter  beta,  and  has  the  form  (Ref 
14:37): 


f(y;P.Q,)  -  < 


STpTqT  xP"1(1'x)a"1  *  °^1 


(3.3) 


0  , 


otherwise 


If  A  and  B  are  known,  a  random  variable  from  the  fonr  para¬ 
meter  beta  can  be  transformed  into  a  standard  beta  random 
variable  by  the  equation  (Ref  14:37): 


X  = 


Y-A 

B-A 


(3.4) 


Some  members  of  the  beta  family  have  specific  names 
attached  to  them.  When  Q»1  in  equation  3.1,  it  is  sometimes 
called  the  power  function,  or  in  equation  3.3,  the  stand¬ 
ardized  power  function  (Ref  14:37).  The  standard  beta  with 
P*Qml/2  is  known  as  the  arc-sine  distribution,  and  is  used  in 
random  walk  theory  (Ref  14:39).  then  P-Q-l,  the  beta 
distibution  reduces  to  the  well-known  continuous  uniform,  or 
rectangular,  distribution. 

The  cumulative  distribution  function  (cdf)  of  the  beta 

distribution  is  found  by  integrating  equation  3.1  from  A  to 

the  point  at  which  the  cdf  is  to  be  evaluated.  The  cdf  of 

the  standard  beta  is  commonly  called  the  incomplete  beta 

function,  and  is  denoted  I  (P,Q)  (Ref  19:131). 

y 

As  mentioned  in  the  introduction,  the  main  reason  that 
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the  beta  family  of  distr ibut ions  is  useful  in  fitting  empir¬ 
ical  distribution  functions  is  its  ability  to  take  on  many 
different  shapes.  A  few  of  these  shapes  are  presented  in 
Figure  3-1.  A  more  complete  collection  of  graphs  of  beta 
density  functions  is  included  in  Johnson  and  Koltz  (Ref 
14:42-44).  Vhen  both  P  and  Q  are  less  than  one,  the 
function  is  U-shaped.  Vhen  one  is  less  than  and  the  other 
greater  than  one,  it  is  J-shaped.  If  both  P  and  Q  are 
greater  than  one,  the  function  is  bell-shaped.  If  P  is 
greater  than  Q,  the  function  is  skewed  to  the  right,  the 

opposite  if  Q>P.  The  function  is  symmetric  about  (B-A)/2  if 

P  equals  Q. 

Obtaining  Starting  Values 

The  solution  methods  used  to  accomplish  the  three  esti¬ 
mation  techniques  explained  in  Chapter  II  all  require  an 
initial  estimate  of  the  parameters  A  and  B.  This  is  done 
through  in t erpol a t i on  performed  on  the  sample  points  as 

follows.  First,  the  sample  is  sorted  from  smallest  to 

lagest,  so  that  is  the  ith  order  statistic  (Ref  19:229). 

Then  the  median  rank  is  found  for  the  two  smallest  and  two 
largest  points.  The  median  rank  of  is  computed  using 

the  formula  (Ref  15:31): 

“<x(i)>  *  ;;H  <3-5> 

For  convenience,  MR(Z^^)  shall  be  denoted  7^. 

The  interpolation  method  is  displayed  graphically  in 


15 


►  IM 

JO.  00  0.50  1.00  1.50  £ToO  2.  SO  3.00  3.50 

i  i  i  i  i  i  a  i 


Figure  3-2.  In  finding  the  estimate  for  A,  the  slope  of  the 
line  connecting  the  first  two  points  is  calculated  by  the 
usual  formula: 


m 


(Y2-Y1)/(X(2)-X(i)) 


(3.6) 


The  estimate  for  A  is  the  point  at  which  this  line  inter¬ 
cepts  the  x-axis.  Using  this  slope  formula  on  this  lower 
portion  of  the  line,  and  then  solving  for  A,  provides  the 
'-following  : 

A 

A  -  X(1)-Y1/M  (3.7) 

-*  The  same  procedure  performed  on  the  largest  two  order 

statistics  gives  the  formula  for  estimating  B: 


B  -  ( 1-Y  )/m  +  X.  .  (3.8) 

n  UJ 

where  m  is  calculated  using  the  nth  and  n-lst  points,  in 
place  of  the  second  and  first  in  equation  3.6. 

To  the  author's  knowledge,  this  is  a  new  method  for 
finding  these  parameters  of  the  beta  distribution.  It  was 
not  mentioned  in  any  of  the  literature  that  was  read  in 
preparation  for  this  thesis.  It  should  be  noted  that  inter¬ 
polation  will  always  give  plausible  estimates  for  A  and  B; 

A 

that  is,  A  is  always  smaller  than  the  first  order  statistic, 
and  B  is  always  larger  than  the  last.  The  estimates  would  be 
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erpolation  for  Starting  Values 


expected  to  be  more  accurate  with  larger  sample  sizes. 
Consistency,  although  not  proven  here,  seems  apparent;  as  n 
increases,  the  first  two  order  statistics  would  draw  closer 
to  the  true  value  of  A,  and  therefore  the  1 i-sar  approxima¬ 
tion  would  better  fit  the  tail  of  the  true  cdf.  An  equi¬ 
valent  argument  applies  to  B  on  the  upper  side.  Intuitively, 
it  seems  reasonable  to  expect  the  estimate  of  A  to  be  more 

a 

accurate  that  B  when  the  pdf  is  skewed  left  —  that  is,  when 

A  A 

Q > p  --  and  to  expect  B  to  be  more  accurate  than  A  when  the 
pdf  is  skewed  right  —  when  P)Q.  This  would  occur  because 
the  skewness  causes  a  large  portion  of  the  data  to  be  at  one 
end  of  the  range,  providing  a  closer  interpolation  at  that 
end . 

Me thod  of  Moments 

The  method  of  moments  is  the  only  estimation  technique 
for  which  an  application  to  the  full  four  parameter  beta  was 
found  in  the  literature;  that  being  a  paper  by  Glenn  Tarr 
(Ref  24).  Moment  estimation  of  the  standard  beta  is  dis¬ 
cussed  in  detail  by  Fielitz  and  Myers  (Ref  8),  and  the 
formulas  for  the  two  moments  required  to  estimate  the  stan¬ 
dard  beta  are  available  from  this  and  various  other  sources 
(Ref  14,18). 

There  are  two  possible  approaches  to  performing  moment 
estimation  of  the  beta  distribution.  First,  the  estimates 
of  A  and  B  found  by  interpolation  could  be  kept  as  final 
estimates,  and  the  first  two  moments  used  to  find  P  and  Q  in 
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a  manner  similar  to  Fielitz  and  Myers.  The  second  method  is 


to  use  the  first  four  moments  to  acquire  method  of  moments 
estimators  for  all  four  of  the  parameters.  Both  of  these 
alternatives  will  be  evaluated  in  this  thesis^ 

The  first  method  requires  the  formulas  for  the  first 
two  moments  of  the  beta  distribution.  The  central  moments 
will  be  used;  the  formulas  are  as  follows  (Ref  14:44): 

-  Ed)  -  A  ♦  (3.9) 

P2  -  Var(I)  -  [(B-A)2PQ]/t(P+Q)  (P+Q+l)]  (3.10) 

These  formulas  are  then  equated  to  the  first  sample  moment, 

the  sample  mean  X,  and  the  second  sample  central  moment,  the 

2 

(biased)  sample  variance  s  .  These  equations  are  then 
solved  for  P  and  Q,  resulting  in  the  folowing  equations: 

P  -  [(X*)2(l-X*)]/S2*  -  X*  (3.11) 

Q  -  [(1-X*)2X*]/S2*  -  (1-X*)  (3.12) 

-  *  - 

where  X  “  (X-A)/(B-A)  is  the  standardized  sample  mean,  and 
S2*  -  S2/(B-A)2  is  the  standardized  sample  deviation.  Using 
the  interpolated  estimates  of  A  and  B,  the  values  of  P  and  Q 
thus  easily  found  from  the  data.  In  Tarr's  paper,  he 
suggests  using  the  two  moment  method  for  "relatively  large" 
samples,  and  suggests  the  first  and  last  order  statistics  as 
estimates  for  A  and  B.  (Ref  24:3). 
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The  above  technique  is  only  a  "partial"  method  of 
moments  estimation,  since  moments  are  not  used  to  estimate  A 
and  B.  Glenn  Tarr,  in  the  paper  referred  to  earlier,  esti¬ 
mated  all  four  parameters  by  method  of  moments,  using  the 
skewness  and  kurtosis.  The  population  skewness  and  kurtoais 
for  the  beta  distibution  are  given  by  the  formulas  (Ref 
18:44): 

K3  =■  t2(P-Q)  (P+Q+l)1/2]/[(PQ)1/2(P+Q+2)]  (3.13) 

K.  =  3(P+Q+1) [2(P+Q)2+PQ(P+Q-6) ]/PQ(P+Q+2) (P+Q+3)  (3.14) 

4 

In  Tarr's  paper,  he  used  a  shifted  kurtosis;  his 
population  kurtosis  formula  is  equivalent  to  subtracting 
three  from  equation  3.14.  His  formula  for  the  sample 
skewness  and  kurtosis  are  as  follows  (Ref  24:8): 

*3  -  [  u3Tj  -3nTa  Tj  +  2Tj3  ]  /  SD3  n  ( n-1 )  (n-2 )  (3.15) 

K.  -  [(n3+n2)T.-4(n2+n)T,T  -3 (n2-n)T2+12nT„T2-6T? ]/ 

4  4  31  2211 

[SD4n(n-l) (n-2) (n-3) ]  (3.16) 


Where  T  -  >X? 

1  ifel1 


and  SD  is  the  sample  atandard  deviation. 


Since  the  skewness  and  kurtosis  involve  only  P  and  Q, 
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setting  these  formulas  equal  to  the  sample  skewness  and 
kurtosis  respectively  provides  two  equations  in  two  unknowns 
which  may  be  solved  by  numerical  techniques.  Once  these 
values  are  found,  A  and  B  are  estimated  using  equations  8 
and  9;  solving  for  A  and  B  rather  than  P  and  Q  leads  to  the 
formulas : 


«  _  o  ~  ^  ^  *  1/2 

A  *  X  -  (  [S‘4P(P+Q+l)]/Q)-l/i4  (3.17) 

A  A  A  A  A  A 

B  =  [X(P+Q) -AQ] /P  (3.18) 

Thus,  the  method  of  moment  estimators  have  been  formed  for 
all  four  parameters. 

Maximum  Like  1 ihood 

To  the  author’s  knowledge,  maximum  likelihood  estima¬ 
tion  of  the  complete  four  parameter  beta  distribution  has 
not  yet  been  developed.  No  reference  to  an  ML  technique 
which  estimates  A  and  B  were  found  in  the  literature.  This 
thesis  will  deal  with  a  "partial"  maximum  likelihood  esti¬ 
mation  process;  that  is,  one  which  keeps  the  interpolated 
estimates  of  A  and  B  as  constants  and  estimates  only  P  and  Q 
by  maximum  likelihood. 

The  technique  used  for  maximum  likelihood  estimation  of 
the  beta  is  that  developed  by  Gna na de s i kan,  Pinkham,  and 
Hughes  in  1967  (Ref  11).  It  dealt  with  the  standard  beta, 
and  performed  the  estimation  using  smallest  order  statistics. 


22 


The  resalting  simultaneous  equations  mere  solved  by  Newton's 
method,  and  it  was  stated  that  "The  starting  value  a. ..used 
are  crucial  for  the  efficient  convergence  of  the  iterative 
scheme"  (Ref  11:611). 

Beckman  and  Tietjen  picked  up  on  Gnanade s ikan,  et  al., 
and  developed  a  solution  method  which  is  "fast,  simple," 
and  for  which  "No  starting  values  are  required  and  no 
convergence  problems  have  been  encountered"  (Ref  2:254). 
The  likelihood  equations  to  be  solved  are  given  as  follows 
(Ref  2:253): 


where 


5 (P)  -  ?(P+Q)  -  lnG1 


5(Q)  -  5(P+Q)  -  lnG„ 


(3.19) 

(3.20) 


A)  / (B-A) ] 


1/  n 


(3.21) 


n _ 

■i  -JT“b-x‘ 


)  / (B-A) ] 


1/n 


(3.22) 


?(Z)  -  ^|-ln(f(Z))  (Ref  11:609)  (3.23) 


In  order  to  solve  the  simultaneous  equations  3.15  and 
3.16,  Beckman  and  Tietjen  used  the  following  procedure. 

A  A 

First,  equation  3.16  was  solved  for  $(P+Q).  This  was  sub- 
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stituted  into  3.15,  which  van  then  solved  for  g(P).  Then, 
the  inverse  of  ((*)  *as  taken  of  each  aide,  providing  an 
equation  for  P.  This  equation  tat  then  substituted  for  P  in 
equation  3.16,  leading  to  the  following  equation  (Kef  2:254): 

«<«>-?{ lC~1(lafli-lmG2«e(0) )J*Q)-lnG2  -  0  (3.24) 

The  root  of  this  equation  was  found  by  the  tenant  nethod; 
this  sane  nethod  was  used  to  evaluate  (~1(*).  The  funetion 
?(•)  was  evaluated  using  an  appr oz i ■ a t i on  given  in  the 
reference.  The  secant  aethod  "requires  the  user  to  specify 
an  interval. ..within  which  the  root  is  located"  (Ref  2:254). 

A  A 

Beckaan  and  Tietjen  provide  tables  froa  which  P  and  Q  can  be 
found  for  given  values  of  Gj  and  G a  listing  of  their 
ooaputer  prograa  is  also  provided.  This  thesis  uses  that 

A  A 

prograa  to  find  the  aaziaua  likelihood  estiaates  of  P  and  Q, 
given  the  interpolated  estiaates  of  A  and  B. 

In  a  coaaent  on  the  Fielitz  and  Myers  paper  (Kef  8), 
which  favored  the  aethod  of  aoaents  for  estiaating  the  beta 
distribution,  and  on  a  rebuttal  by  Roaesburg  (Ref  22),  which 
supported  aaziaua  likelihood,  Eottas  and  Lau  (Ref  16)  wrote 
an  ezcellent  article  which  suaaarizes  both  classical  aethods 
of  estiaating  the  beta,  provides  historical  perspective,  and 
coaaent  s  on  which  is  the  better  technique  for  estiaating  P 
and  Q.  They  state  that  Fisher,  the  initial  developer  of  the 
aaziaua  likelihood  aethod,  "aatheaatioally  proved  that  the 
inherent  variance  of  an  MM  estiaator  [of  the  beta  distribu- 
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tion]  is  always  greater  than  or  equal  to  that  of  the  corre¬ 
sponding  ML  estimator  and  approaches  the  latter  only  in  near 
norsial  cases"  (Sef  16:529).  In  an  article  which  focused  on 
the  small  sample  case,  Dishon  and  Veiss  reached  a  similar 
conclnsion;  they  compiled  a  table  comparing  the  MM  and  ML 
estimators  for  various  parameter  values  and  sample  sizes,  and 
concluded  that  "with  few  exceptions  the  ML  estimator  is  more 
accurate  for  low  n  than  is  the  moment  estimator"  (Ref  5:4). 
In  order  for  the  results  of  this  thesis  to  be  consistent  with 
these  findings,  the  "partial"  maximum  likelihood  estimates 
of  P  and  Q  would  be  expected  to  be  more  accurate  than  the 
"partial"  moment  estimates. 

Minimum  Distance 

This  thesis  represents,  to  the  author's  knowledge,  the 
first  attempt  at  estimation  of  the  parameters  of  the  beta 
distribution  by  the  minimum  distance  technique.  It  is  the 
first  attempt  at  AFIT  to  estimate  more  than  one  parameter  by 
minimum  distance.  Although  some  of  the  AFIT  theses, 
mentioned  earlier  (Refs  4,  13,  17,  20),  did  deal  with  more 
than  one  parameter,  only  one  parameter  was  estimated  by 
minimum  distance;  this  estimate  was  then  used  to  improve  the 
estimates  of  the  other  parameters  found  by  other  methods. 
This  thesis  will  attempt  to  find  minimum  distance  estimates 
for  all  four  parameters  of  the  beta  distribution. 

The  empirical  distribution  function  to  be  used  in  this 
thesis  is  the  1/n  step  function,  which  assigns  the  ith  point 
of  the  ordered  sample  the  value  i/n.  The  Cramer-von  Mises 
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distance  measure  will  be  nsed.  When  applied  to  the  1/n  step 
function,  the  CVM  measure  defined  in  equation  2.13,  with 
uniform  weighting,  reduces  to  (Kef  23:731): 


W2 ( Sn, F) 


FU^- 


21-112 
_2  n~  J 


(3.25) 


Where  F(X.)  is  the  cdf  evaluated  at  the  ith  sample  point. 
This  same  source  provides  similar  formulas  for  applying  many 
of  the  other  distance  formulas  mentioned  in  Chapter  II  to 
the  1/n  step  function  EDF. 

The  process  used  to  find  the  minimum  distance  estimates 
is  as  follows.  First,  the  initial  estimates  of  A  and  B  are 
found  by  interpolation.  Then  the  moment  equations,  3.11  and 
3.12,  are  used  to  get  starting  values  for  P  and  Q.  Holding 
A  and  B  fixed,  equation  3.25  is  minimized  for  P  and  Q  (the 
parameters  P,  Q,  A,  and  B  are  implicitly  contained  in 
F(Ii)).  After  this  minimization  is  accomplished,  P  and  Q  are 
held  constant  at  the  new  values,  and  3.25  is  minimized  for  A 
and  B.  The  resulting  values  of  P,  Q,  A,  and  B  are  the 
minimum  distance  estimates  of  these  parameters. 


IV :  Monte  Carlo  Ant 1 vs i » 


This  chapter  deals  with  the  specific  method  used  to 
perform  the  comparison  of  estimation  techniques  which  is  the 
main  purpose  of  this  thesis.  The  comparison  is  performed 
using  Monte  Carlo  analysis.  There  are  basically  three  steps 
to  a  Monte  Carlo  anaysis  of  an  estimation  method.  First, 
random  samples  from  the  distribution  to  be  estimated  are 
generated.  Second,  the  parameters  of  the  distribution  are 
estimated  from  these  samples.  Third,  the  estimations  are 
evaluated  as  to  how  well  they  approximated  the  true  distri¬ 
bution.  This  chapter  will  discuss  each  of  these  steps  in 
detail . 

Since  there  are  a  vast  amount  of  data  and  large  numbers 
of  calculations  involved  in  Monte  Carlo  analysis,  use  of  a 
high-speed  computer  is  a  necessity.  A  Control  Data  Corpor¬ 
ation  (CDC)  computer  system,  located  at  Aeronautical  Systems 
Division,  Vright-Patterson  Air  Force  Base,  Ohio,  was  used  in 
performing  the  analysis  for  this  thesis.  In  programming 
each  of  the  three  steps  outlined  above,  existing  software 
was  used  whenever  possible;  specifically,  subroutines  from 
the  International  Mathematical  Statistics  Library  (IMSL) 
were  widely  used.  The  reader  should  refer  to  the  IMSL 
manual  (Ref  12)  if  specific  information  about  these  routines 
is  desired. 

Generation  sJ.  P ». U 

In  order  for  the  comparisons  made  by  this  thesis  to  be 
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valid,  they  should  be  made  for  a  number  of  sample  sizes  and 
several  different  combinations  of  parameter  values.  Five 
sample  sizes  were  nsed:  4,  8,  12,  16,  and  20.  For  each  of 
these  sample  sizes,  three  combinations  of  P  and  Q  were  nsed: 
P-3,  Q-3;  P-9 ,  Q-4;  and  P-1,  Q-2  (refer  to  Figure  1  in 
Chapter  III  for  a  graph  of  these  curves).  In  order  to  save 
on  computer  time,  only  one  combination  of  values  was  used 
for  A  and  B;  different  values  are  not  expected  to  have  an 
affect  on  estimation  ability,  since  such  a  change  would  only 
result  in  a  linear  translation  along  the  axis.  The  values 
A-2,  B-10  were  arbitrarily  chosen  for  this  analysis. 

For  each  of  the  15  oases  (five  sample  sizes  times  three 
P,  Q  combinations),  1000  samples  were  generated  for  use  in 
estimation.  Generation  of  beta  random  variates  was  accom¬ 
plished  using  the  IMSL  routine  GGBTR,  which  provides  an 
array  of  standard  beta  random  variates  for  a  specified  P,  Q, 
and  sample  size.  The  resulting  array  was  then  sorted  using 
the  IMSL  routine  VSRTA.  The  random  variates  were  then 
unstandardised,  using  equation  3.4  when  solved  for  T  instead 
of  X,  so  that  the  random  variates  now  formed  a  sample  from 
the  desired  four  parameter  beta  distribution. 

Since  the  interpolated  estimates  of  A  and  B  are  needed 
for  all  three  estimation  techniques,  they  were  calculated  in 
the  same  program  which  performed  the  data  generation.  The 
method  is  exactly  as  described  in  Chapter  III;  the  median 
ranks  of  the  first  two  and  last  two  points  of  the  previously 
sorted  sample  were  calculated  using  equation  3.5,  the  slopes 


were  found  using  equation  3.6  and  a  similar  expression  for 
the  two  highest  points,  and  then  equations  3.7  and  3.8  were 
applied  to  calculate  the  estimates.  The  mean  and  standard 
deviation  of  each  saaple  were  also  calculated  at  this  time. 
The  number  of  replications  (1000  in  this  analysis),  sample 
size,  true  values  of  P,  Q,  A,  and  B,  and  1000  sets  of 
replication  number,  list  of  random  variates,  interpolated 
estimates  of  A  and  B,  mean,  and  standard  deviation  were 
stored  on  permanent  file  for  use  by  each  of  the  three  esti¬ 
mation  programs.  A  listing  of  the  program  which  performed 
this  data  generation  is  provided  in  Appendix  C. 

Computer iz at  ion  of  Est imat ion  Techniques 

Mo  thod  of  Moments.  In  the  previous  chapter,  it  was 
explained  that  tvo  sets  of  moment  estimators  would  be  calcu¬ 
lated.  They  are  the  'partial'  MM  estimators,  using  the 

a  a 

interpolated  estimates  of  A  and  B  and  finding  P  and  Q  by  the 
first  two  moments,  and  the  'full'  MM  estimators,  which  use 
the  first  two  moments,  skewness,  and  kurtosis  to  calculate 

AAA  A 

P,  Q,  A  and  B.  The  computer  program  written  to  do  this  is 
included  in  Appendix  D. 

The  'partial*  method  of  moments  estimates  were  found 
first;  this  was  done  through  direct  application  of  equations 
3.11  and  3.12.  The  ’full*  MM  estimation  technique  is  based 

A 

on  the  procedure  suggested  by  Tarr  (Ref  24).  Finding  P  and 

A 

Q  from  the  skewness  and  kurtosis  involves  solving  a  system 
of  two  nonlinear  equations  in  two  unknowns.  The  IMSL 
routine  ZSCNT  was  used  to  ac c o m p 1 i sh  *  th  i  s ,  using  the 
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'partial’  MM  estimates  of  P  and  Q  as  starting  raises.  If 
the  program  were  to  attempt  to  estimate  either  P  or  Q  as 
less  than  zero,  the  program  was  designed  to  sse  the 
'partial'  moment  estimators  for  that  sample.  When  the  pro¬ 
gram  was  executed,  this  was  found  to  occur  for  virtually 
every  sample;  when  attempts  to  remedy  this  failed,  Tarr’s 
approach  was  abandoned.  These  results  will  be  discussed 
further  in  the  next  chapter. 

Maximum  Litel ihood.  The  program  used  to  perform  the  ML 
estimation  is  taken  from  the  article  by  Beckman  and  Tietjen 
(Ref  2:258).  The  only  changes  made  were  for  input  of  data, 
calculation  of  and  G and  output  of  results.  The  reader 
should  refer  to  the  source  article  for  a  program  listing; 
since  the  changes  were  superficial,  the  listing  will  not  be 
provided  here. 

Minimum  Distance.  Computerization  of  the  minimum  dis¬ 
tance  method  follows  directly  from  the  process  outlined  in 
Chapter  III.  After  the  data  was  read  in,  the  IMSL  routine 
ZXMIN  was  used  to  perform  the  minimization.  This  routine 
minimizes  a  function,  in  this  case  equation  3.25,  for  an 
array  of  parameters.  Parr  and  Schucuny  used  this  routine  in 
their  analysis  (Ref  21:21).  Using  the  two-moment  estimates 
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as  starting  values,  P  and  Q  were  used  first  as  input  para- 

A  A 

meters  for  ZXMIN,  while  A  and  B  were  held  constant  through 
use  of  a  COMMON  statement.  Vhen  this  minimization  was  com- 

A  A 

pleted,  ZXMIN  was  used  a  second  time,  this  time  with  A  and  B 

0 
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as  input  parameters,  using  the  interpolated  estimates  as 

A  A 

starting  values,  while  the  values  of  P  and  Q  found  in  the 
first  minimization  were  held  constant  using  a  COMMON  state¬ 
ment.  The  estimate  of  A  was  set  equal  to  the  first  order 
statistic  if  ZXMIN  attempted  to  estimate  A  greater  than  X^^; 

A  A 

similarly,  ^(n)  was  used  for  B  if  ZXMIN  attempted  to  set  B  < 
X^nj.  The  listing  of  the  computer  program  used  to  perform 
the  minimum  distance  estimation  is  included  in  Appendix  E. 

Comparison  of  Est imat ion  Techniques 

The  third  step  in  the  Monte  Carlo  analysis  is  to 
evaluate  the  estimates;  this  evaluation  will  be  used  as  a 
basis  in  comparing  the  estimation  techniques.  There  are  two 
approaches  which  could  be  used  for  this  evaluation.  The 
first  approach  would  be  to  individually  measure  how  close 
the  estimates  of  each  parameter  are  to  the  true  value  of 
that  parameter.  The  measure  commonly  used  for  this  type  of 
evaluation  is  the  mean  square  error  (MSE).  The  second 
approach  is  to  calculate  an  overall  measure  of  how  well  the 
estimated  distribution  fits  the  true  distribution.  A 
distance  measure  of  the  type  outlined  in  Chapter  III  is  an 
appropriate  measure  for  this  approach. 

This  thesis  used  both  of  these  approaches;  the  mean 
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square  errors  of  P,  Q,  A  and  B  were  found,  and  the  mean  CVM 
distance  between  the  estimated  and  true  cdf  calculated,  for 
each  estimation  method  and  each  of  the  15  cases  of  sample 
size  and  P,  Q  values.  The  program  written  to  perform  both 
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of  these  evaluations  is  listed  in  Appendix  F.  These  two 
approaches  will  now  be  explained  in  more  detail. 

M  e.  a.  n  Sitter  e  Errors .  The  mean  square  error  is  a 
measure,  based  on  repeated  estimation,  of  how  well  an  esti¬ 
mation  method  has  estimated  a  given  parameter.  The  formula 
for  calculating  the  mean  square  error  is  as  follows: 

MSB ( 0 )  =  [|(0i-e)2]/N  (4.1) 

i=l 

A 

where  6  is  the  true  value  of  the  parameter,  6^  is  the  ith 
estimate,  and  N  is  the  number  of  times  the  estimation  is 
performed — in  this  analysis,  N*1000.  A  difficulty  in  using 
MSE's  when  many  parameters  are  estimated  is  that  conflicting 
results  are  possible;  estimation  method  A  may  have  the 
smallest  HSE  for  parameter  1,  while  method  B  has  the 
smallest  MSE  for  parameter  2  for  the  same  case.  Another 
potential  difficulty  with  MSE's  is  that  they  are  not  scale 
invariant;  the  same  size  MSE  may  be  highly  significant  for  a 
small  valued  parameter,  but  insignificant  for  a  larger  one. 
This  can  complicate  comparison  of  MSE’s  for  different  para¬ 
meter  values. 

gVM  Distance.  The  Cramer-von  Mises  (CVM)  goodness  of 
2 

fit  statistic  W  was  defined  by  equation  2.13,  using  the 
uniform  weighting  £(*)~1.  In  this  usage,  however,  the  esti¬ 
mated  cdf  F  is  used  in  place  of  Sn;  the  integral  is 
multiplied  by  the  sample  size  to  form  the  actual  distance 
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measure  (Ref  23  )  . 


Since  dF ( x ) =--dxa f ( x ) dz  ,  the  formula  for 
dz 

A 

the  distance  between  the  estimated  cdf  F  and  the  true  cdf  F 


W2(F,F>  =  nT(F(X,fi)-F<X,a) )2f (X.g)dX  (4.2) 


where  ©= ( P , Q, A, B ) ,  ( P, Q, A, B) ,  and  f(x,9)  is  the  true  pro¬ 
bability  density  function  (pdf).  This  integral  was 
evaluated  using  16  point  Gaussian  quadrature.  This  solution 
technique  requires  that  the  integral  be  over  the  limits  -1 
to  1;  this  is  satisfied  through  the  identity  (Ref  10:221): 


(4.3) 


where 


(b-a ) t+b+a 


Gaussian  quadrature  estimates  an  integral  with  limits 
-1  and  1  using  a  weighted  sum,  as  follows  (Ref  1:916): 


Tg ( x ) dx  ~  >w  g(x. : 
-1  i-ll 


(4.4) 


where  w^  a.nd  x^  are  the  ith  Gaussian  weights  and  quadrature 
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points  respectively,  and  n  is  the  number  of  points;  for  this 
analysis,  n*16.  The  appropriate  weights  and  points  were 
taken  from  the  Handbook  of  M  a£he  m  ESSilifiSl  (Ref 

1:916).  Since  it  is  possible  that  the  estimates  of  A  and  B 
will  be  inside  of  the  true  values,  some  of  the  quadrature 
points  may  be  in  places  where  F  is  not  defined.  This  problem 
was  overcome  by  defining  F(x)  to  be  zero  when  X<A  and  one 
when  X>B.  This  situation  is  depicted  graphically  in  Figure 
3.  Having  the  estimates  outside  of  the  true  values  of  A  and 
B  presents  no  problem,  since  the  integral  is  calculated  only 
between  the  true  values. 

The  CVM  statistic  was  calculated  as  just  described  for 
each  of  the  1000  replications.  The  sample  mean  and  standard 
deviation  of  the  CVM  statistics  can  therefore  be  calculated 
using  the  usual  formulas: 

W2=  [jwJJ/1000  (4.5) 

i*l 

sd(w2)  - 

2 

where  is  the  CVM  distance  of  the  ith  estimation. 

Since  the  number  of  replications  is  large,  the  central 
limit  theorem  implies  that  the  mean  CVM  statisic  is  appoxi- 
mately  normally  distributed  (Ref  19:252).  Therefore,  a 
confidence  interval  for  the  mean  CVM  statistic  may  be  ealeu- 


iflOO  _ 

2(1 i-f 

*  —  ■* 


2)2/1000 


(4.6) 
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lated  (Ref  19:277).  The  formula  for  the  confidence  interval 
is  as  follows  (Ref  2S:195): 

P(W2-Za/2SD(W2) /ift  <  E(W2)  <  W2+Za/2SD(W2) /yS)-l-o  (4.7) 

where  a  is  the  significance  level,  Za/2  is  the  value  of  the 
standard  normal  leaving  an  ares  of  a/2  to  the  right,  and  N 
is  the  number  of  replications,  in  this  case,  1000. 

The  CVX  distance  has  the  advantage  over  the  HSE  of 
being  a  single  measure  of  fit  for  all  four  parameters,  and 
also  of  being  scale  invariant  with  respect  to  the  size  of 
the  parameter  values.  One  disadvantage,  which  goes  along 
with  the  first  advantage,  is  that  information  about  the 
individual  parameters  is  lost.  At  times,  one  may  wish  to 
know  which  method  worked  better  on  a  particular  parameter; 
for  this,  the  MSB  is  the  better  measure.  It  should  be  clear 
that  there  is  no  bias  introduced  by  using  the  CVN  measure 
both  in  finding  the  minimum  distance  estimate  and  then  again 
in  evaluating  this  estimate.  That  is  because  during  minimi¬ 
zation  the  distance  between  the  estimated  distribution  func¬ 
tion  and  the  empirical  distribution  function  is  measured, 
while  during  evaluation  the  distance  measured  is  between  the 
estimated  distribution  funotion  and  the  true  distribution 
function. 
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V.  Remits  ££d 


Results  o f  Comparisons 

The  numerical  results  of  the  two  comparison  approaches 
are  summarized  in  the  appendices.  There  are  fifteen  tables 
in  each  appendix,  one  for  each  case  of  sample  size  and 
parameter  values.  Appendix  A  contains  the  tables  of  mean 
square  errors.  The  three  estimation  methoda;  method  of 
moment  estimation  (MME),  maximum  likelihood  estimation 
(MLE),  and  minimum  distance  estimation  (MDE)  are  listed  down 

4*  A  A 

the  side.  The  mean  square  errors  of  P  (MSEP) ,  Q  (MSEQ),  A 
(HSEA) ,  and  B  ( MSEB )  are  then  listed  across  the  row  for  each 
method.  Appendix  B  contains  the  means,  standard  deviations, 
and  confidence  intervals  for  the  CVM  statistics.  Again,  the 
three  methods  are  listed  down  the  side.  Across  each  row 
are,  in  order,  the  sample  mean  of  the  CVM  statistics  (MCVM), 
the  sample  standard  deviation  of  the  CVM  statistics  (SDCVM), 
the  lower  limit  and  the  upper  limit  of  the  95%  confidence 
interval  of  the  mean  of  the  CVM  distance  (95%  C.I.).  It 
should  be  noted  that  SDCVM  is  an  estimate  of  the  population 
standard  deviation;  it  must  be  divided  by  /1 00 0  to  get  an 
estimate  of  the  atandard  deviation  of  the  mean. 

The  reader  may  note  that,  in  some  cases,  the  MSE's  and 
the  mean  CVM  statistic  seem  to  contradict  each  other;  that 
is,  one  method  may  have  MSB’s  equal  to  or  smaller  than 
another  for  all  parameters,  but  the  other  method  has  a 
smaller  mean  CVM  statiatio.  The  case  of  N*4,  P”  9 ,  Q- 4  is 

such  a  case;  both  MME  and  MLE  are  the  Interpolated  eatimates 
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of  A  and  B,  so  these  MSE’s  are  equal.  However,  the  MSE'S  of 

A  A 

both  P  and  Q  are  smaller  for  the  MLE,  bat  the  mean  CVM  is 
smaller  for  the  MME. 

This  phenomenon  can  occur  because  of  the  way  in  which 
the  four  parameters  of  the  beta  distribution  interact.  The 
reader  should  refer  to  Figure  1  in  Chapter  III,  and  review 
the  shape  of  the  beta  when  P*9  and  Q»4.  The  curve  remains 
very  close  to  the  z-azis  until  about  1/4  of  the  way  from  A 
to  B.  For  this  reason,  A  is  usually  interpolated  at  about 
this  point  on  the  azis.  Using  this  as  the  lower  limit, 
however,  the  distribution  is  much  more  symmetric  than  when 
the  true  value  of  A  is  used.  Therefore,  the  values  of  P  and 
Q  which  best  approzimate  the  curve  for  the  interpolated 
estimates  of  A  and  B  are  different  from  the  true  values  of 
P  and  Q.  A  set  of  values  which  are  further  away,  in  an  MSE 
sense,  from  the  true  values  than  another  set,  may  thus  be 
closer  in  a  distance  sense  to  the  true  distribution  due  to 
the  estimation  of  A  and  B. 

Since  the  MSB's  and  the  mean  CVM  statistics  contradict 
each  other  at  times,  it  is  necessary  to  choose  one  to  serve 
as  a  basis  of  comparison  between  the  estimation  methods. 
The  mean  CVM  statistic  will  be  used  since  it  is  an  overall 
measure  and  does  not  depend  on  the  particular  parameter 
values.  The  method  with  the  smallest  mean  CVM  statistic 
provides  the  closest  fit,  on  the  average,  to  the  true  dis¬ 
tribution,  Using  this  criterion,  the  moment  estimate  is 
ranked  first,  followed  by  mazimum  likelihood  and  then  mini- 
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mum  distance,  for  all  sample  sizes  in  cases  P«  Q- 3  and  P-9, 
Q-4 ;  and  when  N-4  for  P-1 ,  Q-2 .  For  the  other  four  sample 
sizes  with  P-1,  Q-2 ,  maximum  likelihood  has  the  smallest 
mean  CVM,  with  NME  next,  and  then  MDE.  THe  MSE,  however,  is 
saaller  for  MLE  than  for  NME  in  most  of  the  cases. 

Using  just  the  point  estimates  of  the  mean  CVM  distance 
gives  no  indication  of  whether  the  differences  between  the 
MCVM  for  each  method  is  large  enough  to  be  significant.  For 
this,  the  95%  confidence  intervals  can  be  used.  If  the 
confidence  intervals  of  two  methods  overlap,  this  indicates 
that  the  mean  CVM  diatances  of  the  methods  are  not  signifi¬ 
cantly  different.  This  comparison  is  equivalent  to  perform¬ 
ing  a  t  test  of  the  difference  between  the  means.  Since 
there  are  three  means  being  considered,  comparing  these 
three  confidence  intervals  equates  to  performing  multiple  t 
teats;  for  this  reason,  the  effective  a  level  -  that  is,  the 
probability  of  finding  two  means  to  be  significantly  diffe¬ 
rent  when  they  are  not  -  is  actually  somewhat  higher  than 
0.05.  When  the  confidence  intervals  listed  in  Appendix  B 
are  compared,  they  show  that  the  95%  confidence  intervals 
for  the  MME  and  MLE  distance  measures  overlap  in  every  case. 
This  indicates  that  the  difference  between  them  it  not 
statistically  significant.  Comparing  confidence  intervals 
for  the  MDE  indicates  mixed  results.  In  some  cases  (N-4, 
P-3,  Q-3;  N-4,  P-1,  Q-2;  N-12,  P-9,  Q-4),  the  CVM  statistic 
of  the  minimum  distance  method  is  significantly  greater  than 
for  both  of  the  other  methods.  In  five  cases  (N-4,  P-9, 
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Q=4 ;  N=8,  P-9,  Q-4 ;  N-12,  P-3,  Q-3;  N-12,  P-1,  Q-2;  N-20, 

P-3,  Q*3),  the  CVM  statistic  for  MDE  is  significantly 

greater  than  the  snaller  of  the  other  distances,  bat  not  the 
larger  one.  In  the  other  seven  cases,  all  three  confidence 
intervals  overlap,  so  that  none  of  the  differences  are 
significant. 

Although  not  nsed  in  coapsring  overall  effectiveness, 
the  aean  square  errors  anst  be  used  if  effectiveness  in 
estiaating  a  particular  paraaeter  is  to  be  coapared. 

a  A 

Problems  arise,  however,  in  coaparing  the  MSE's  of  P  and  Q, 

A  A 

since  they  depend  on  the  values  of  A  and  B  as  described 
earlier.  However,  since  the  interpolated  estimates  of  A  and 
B  do  not  depend  on  other  parameters,  these  can  be  compared 
to  the  MDE  estimates  of  A  and  B  using  MSE's.  In  this  com¬ 
parison,  the  MDE  fares  better  than  in  the  previoua  para¬ 
graph.  Overall,  the  differences  in  the  MSE's  are  small. 
The  interpolated  estimates  are  better  in  all  cases  with 
sample  size  four.  For  sample  size  eight,  neither  method  has 
a  clear  superiority.  With  N-12,  the  MSB’s  using  minimum 

A 

distance  are  smaller  in  all  oases  except  for  A  when  P-1,  Q- 
2,  where  it  is  slightly  larger.  The  ainimua  distance  esti¬ 
mates  have  a  smaller  MSE  than  the  interpolated  estimates  of 
A  and  B  for  all  cases  with  sample  sizes  16  and  20. 

In  Chapter  III,  in  the  section  on  obtaining  starting 
values,  a  number  of  suppositions  were  made  concerning  the 

A  A 

interpolated  values  of  A  and  B.  The  results  in  Appendix  A 
can  now  be  uaed  to  test  these  suppositions.  Consistency  was 
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the  first  supposition;  the  Accuracy  of  the  estimates  was 
expected  to  increase  as  sample  size  increasd.  This  is 
supported  by  the  results.  The  mean  square  errors  of  both  A 
and  B  decrease  mono toni cal ly  as  the  sample  size  increases  for 
all  three  sets  of  parameter  values.  The  second  supposition 
was  that  left-skewed  distributions  would  provide  better 
estimates  of  A,  while  right-skewed  distributions  would  pro¬ 
vide  better  estimates  of  B.  This  is  also  borne  out  by  the 
results.  For  the  case  P“9,  Q*  4 .  which  is  skewed  right,  the 

A  A 

MSE  of  B  is  always  much  less  than  the  MSE  for  A;  in  fact, 

A  A 

the  MSE  of  A  is  always  at  least  nine  times  the  MSE  of  B. 

A 

For  the  left-skewed  case,  P“l,  Q=2 ,  the  MSE  of  A  is  always 

A 

less  than  the  MSE  of  B  by  at  least  a  factor  of  seven.  In 

A  A 

the  symmetric  case,  P*Q»3 ,  the  MSE's  of  A  and  B  are  nearly 
equal  in  all  the  sample  sizes. 

Cone lus ions 

Does  the  president  of  the  Bertrand  Corporation  have  an 
answer  to  his  question?  lfhat  is  the  best  method?  Based 
upon  the  results  of  this  analysis,  and  for  the  range  of 
sample  sizes  considered,  the  method  of  moments  using  inter¬ 
polated  values  of  A  and  B  seems  to  the  the  best  choice  of 
the  three  methods  investigated.  It  provides  estimates  which 
fit  the  true  distribution  at  least  as  well  as  the  other  two 
methods,  and  is  more  easily  computed  than  either  the  methods 
of  maximum  likelihood  or  minimum  distance.  For  a  single 
sample,  the  moment  estimates  may  be  easily  found  using  a 
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desk  calculator.  The  method  of  maximum  likelihood  provides 
estimates  of  nearly  equal  accuraoy  to  the  method  of  moments, 
and  is  certainly  a  viable  alternative;  however,  the  computa¬ 
tion  is  more  involved  and,  for  an  exact  solution,  requires 
computerization.  The  minimum  distance  method,  when  applied 
to  all  four  parameters,  does  not  appear  to  be  as  good, 
especially  in  light  of  the  fact  that  it  requires  much  more 
computer  time  than  the  others  to  obtain  a  solution.  The 
fact  that  its  estimates  of  A  and  B  were  improvements  would 
seem  to  indicate  that  minimum  distance  might  be  successfully 
applied  to  the  location-type  parameters  of  the  beta  distri¬ 
bution. 

The  four  moment  estimation  technique  suggested  by  Tarr 
(Ref  24)  is  apparently  not  as  straightforward  as  he  seems  to 
believe.  Careful  re-reading  of  this  paper  revealed  that  the 
author  apparently  did  no  actual  verification  or  Monte-Carlo 
analysis  of  this  technique  at  all.  His  tables  seem  to  have 
been  generated  merely  by  choosing  values  of  P  and  Q, 
plugging  them  into  the  formulas  for  the  population  skewness 
and  kurtosis,  and  tabling  the  resulting  values.  More  work 
is  required  on  this  method  if  it  is  to  be  a  viable 
alternative  to  the  others  presented  herein. 

Rf  lSJL  E  .5  ether  Slgil 

As  stated,  investigation  into  the  viability  of  Tarr's 
method  is  a  possible  area  of  study.  Another  possibiliy 
would  be  extension  of  the  maximum  likelihood  method  in 
Beckman  and  Tietjen  (Ref  2)  to  all  four  parameters.  A  third 
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alternative  would  be  to  combine  MME  and  MLE  into  a  'hybrid' 
approach.  This  method  could  use  the  interpolated  values  of  A 
and  B  to  obtain  ML  estimates  of  P  and  Q,  just  as  done  in  this 

a  A 

thesis.  Then,  these  estimates  of  P  and  Q  could  be  used  in 
equations  3.17  and  3.18  to  obtain  new  estimates  of  A  and  B 
with  the  first  two  moments.  It  would  then  be  possible  to  use 
these  to  re-estimate  P  and  Q,  and  perhaps  loop  through  this 
procedure  until  the  desired  accuracy  is  achieved.  The  study 
would  have  to  determine  if  this  would  lead  to  significant 
improvements  over  the  methods  evaluated  in  this  thesis. 

Regarding  minimum  distance  estimation,  this  was  the 
first  attempt  at  applying  this  method  to  the  beta  distribu¬ 
tion,  and  it  should  not  be  abandoned  just  because  it  is  not 
yet  as  good  as  the  other  techniques.  There  is  still  work  to 
be  done.  One  area  that  could  be  explored  would  be  to  try 
other  distance  measures  to  see  if  they  may  lead  to  an  im¬ 
provement.  For  instance,  the  Anderson-Darling  statistic  may 
turn  out  to  be  better  for  estimating  A  and  B,  since  it  is 
more  heavily  weighted  at  the  tails  of  the  distribution. 
Another  possibility  would  be  to  only  estimate  one  parameter 
by  MDE.  One  may  wish  to  use  Tarr's  formulation  for  the  beta 
distribution,  which  uses  for  A  as  the  location  parameter, 
and  M,  which  equals  B-A,  as  a  range  parameter  (Ref  24:1).  X 

o 

A  A 

and  M  could  be  found  by  interpolation,  P  and  Q  by  MM  or  ML, 

A  A  A 

and  then  Xg  could  be  refined  by  MD,  keeping  M,  P,  and  Q 

A 

fixed.  This  new  XQ  could  then  be  used  to  improve  the  esti- 

A  A 

mates  of  P  and  Q.  This  would  also  cut  down  on  the  use  of 
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Appendix  A 


Tables  of  Mean  Square  Errors 


The  following  notation  is  used  in  this  Appendix. 


Te rm  Notat  ion 

Method  of  Moments  Estimation  .  MME 

Maximum  Likelihood  Estimation  .  MLE 

Minimum  Distance  Estimation  .  MDE 

a 

Mean  Square  Error  of  P  . .  MSEP 

Mean  Square  Error  of  Q  .  MSEQ 

A 

Mean  Square  Error  of  A  .  MSEA 

A 

Mean  Square  Error  of  B  .  MSEB 


Monte  Carlo  sample  size  is  1000  and  true  values  of  A 
and  B  are  2  and  10  for  all  tables. 
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TABLE  A- I 


Mean  Squire  Errors 

Sample  size  4 
True  P  3 
Trne  Q  3 


Mi£P 

MEfl 

MSEA 

M$PB 

MME 

3.742108 

3 .658086 

4.668322 

4.653648 

MLE 

2.739536 

2.649655 

4.668322 

4.653648 

MDE 

12.371356 

14.542017 

4.744504 

4.737503 

TABLE  A- I I 
Mean  Sqnare  Errors 
Sample  Size  4 


Trne 

P 

9 

Trne 

Q 

4 

MME 

MULE 

60.861580 

msbq 

8.601137 

MS, PA 

16.911807 

M§EB 

1.726146 

MLE 

55.594636 

6.942450 

16.911807 

1.726146 

MDE 

67.128382 

13.105965 

17.014121 

1.756985 

TABLE  A-III 

Mean  Sqnare  Errors 

Sample  size  4 

True  P  1 

Trne  Q  2 


MSEP 

MSEQ 

M1EA 

M1EB 

MME 

0.410244 

0.936122 

1.217973 

8.993774 

MLE 

0.626805 

0.645073 

1.217973 

8.993774 

MDE 

5.416876 

21.291772 

1.239676 

9.114334 
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TABLE  A- IV 


Mean  Square 

Errors 

Sample  size 

8 

True  P 

3 

True  Q 

3 

MSPP 

MSEQ 

MM 

MSEB 

MME 

3 .427817 

3.354894 

2.904387 

2.940100 

MLE 

3.049777 

2.987382 

2.904387 

2.940100 

MDE 

9.766766 

9.877664 

2.890916 

2.957770 

TABLE  A-V 
Mean  Square  Errors 


Sample  Size  8 


True 

P 

9 

True 

Q 

4 

sssscssxss 

—  _ — 

BISSSSBXSSSSS 

mcm.mEnmmmm 

1 

M$EP 

MSEQ 

MSM 

MSEB 

1 

1 

MME 

56.805536 

7.960173 

13.934737 

1.117937 

1 

1 

MLE 

54.788050 

7.319176 

13.934737 

1.117937 

1 

1 

MDE  * 

60.792229 

18.236968 

13.786411 

1.111636 

1 

TABLE  A-VI 

Mean 

Square  Errors 

Sample  size 

8 

True 

P 

1 

True 

Q 

2 

r 

MSEP 

M1E9 

MM 

MSEB 

i 

i 

MME 

0.332060 

0.975033 

0.335525 

4.998726 

i 

i 

MLE 

0.328806 

0.862249 

0.335525 

4.998726 

i 

i 

MDE 

8.262157 

23.410740 

0.363392 

4.971074 

i 

i 


t 


SO 


) 


TABLE  A-VII 


Mean  Square  Errors 

Sample  size  12 
True  P  3 
True  Q  3 


MSEP 

MSEQ 

MSEA 

MSEB 

MME 

2.825820 

2.868660 

2.089426 

2.126352 

MLE 

2.761430 

2.815917 

2.089426 

2.126352 

MDE 

5.539327 

4.950296 

2.013997 

2.032997 

TABLE  A-VII I 


Mean  Square 

Errors 

Sample  Size 

12 

True  P 

9 

True  Q 

4 

ssse 

M££P 

smasxnacre 

MSEQ 

MSEA 

MSgP 

MME 

52.775368 

7.208131 

12.263634 

0.858765 

MLE 

52.519962 

7.059538 

12.263634 

0.858765 

MDE 

49.354920 

9.095091 

12.013108 

0.826734 

TABLE  A- IX 

Mean  Square  Errors 

Sample  size  12 

True  P  1 

True  Q  2 


MSEP 

MSEQ 

MSEA 

MSEB 

MME 

0.230257 

0.838969 

0.142688 

3.206688 

MLE 

0.207590 

0.777774 

0.142688 

3.206688 

MDE 

1 .013119 

3  .495416 

0.146702 

3.129456 

TABLE  A-X 


Mean  Square  Errors 
Sample  size  16 


True 

P 

3 

True 

Q 

3 

1 

MSfiP 

MSEQ 

MSEA 

MSEB 

1 

1 

MME 

2.609627 

2.558815 

1.826545 

1.650983 

1 

1 

MLE 

2.708616 

2.671452 

1.826545 

1.650983 

1 

1 

MDE 

2.943447 

2.845556 

1.732211 

1.560583 

1 

TABLE  A-XI 

Mean 

Square  Errors 

Sample  Size 

16 

Trne 

P 

9 

True 

Q 

4 

I 

MS  Eg 

MSEQ 

MMA 

MgSP 

~T 

1 

MME 

50.354216 

6.726745 

11.302901 

0.716087 

1 

1 

MLE 

51.150863 

6.818277 

11.302901 

0.716087 

1 

1 

MDE 

46.387784 

6.755438 

11.102680 

0.690490 

1 

TABLE  A-XI I 

Mean 

Square  Errors 

Sample  size 

16 

Trne 

P 

1 

Trne 

Q 

2 

I 

I2I£ 

M2£J2 

urn 

MSEB 

1 

1 

MME 

0.168482 

0.782000 

0.093014 

2.697157 

1 

1 

MLE 

0.155296 

0.777950 

0.093014 

2.697157 

1 

1 

MDE 

0.364832 

1.755449 

0.092245 

2.604381 

1 
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TABLE  A-XIII 


Mean  Square  Error* 

Sample  size  20 
True  P  3 
True  Q  3 


! 

MSE£ 

MSEQ 

MSEA 

MSEB 

1 

1 

MME 

2.300955 

2.353331 

1.454751 

1 .414300 

1 

1 

MLE 

2.483768 

2.551249 

1.454751 

1.414300 

1 

1 

MDE 

2.645247 

2.566947 

1.369513 

1.325909 

1 

TABLE  A-XIV 

Mean 

Square  Errors 

Sample  Size 

20 

True 

P 

9 

True 

Q 

4 

r 

MSI£ 

M£3 

MSEA 

MSEB 

1 

i 

MME 

49.079484 

6.438000 

10.798213 

0.640201 

1 

i 

NLE 

50.543684 

6.711302 

10.798213 

0.640201 

1 

i 

MDE 

43.965367 

5.839172 

10.579763 

0.610275 

1 

TABLE  A- XV 

Mean 

Square  Errors 

Sample  size 

20 

True 

P 

1 

True 

Q 

2 

r 

MSEP 

MSEQ 

MSEA 

MSEB 

1 

i 

MME 

0.125813 

0.645373 

0.057157 

2.090599 

1 

i 

MLE 

0.133994 

0.639272 

0.057157 

2.090599 

1 

i 

MDE 

0.238399 

1.041494 

0.056218 

2.006221 

1 

l 
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Appendix  B 


Tables  o f  Crame r-von  Wises  Distance : 

Means ■  Standard  Deviations .  and  Conf idence  Interval s 


The  following  notation  is  used  in  this  appendix. 


Tersi  Notat  ion 

Method  of  Moments  Estimation  .  MME 

Maximum  Likelihood  Estimation  .  MLE 

Minimum  Distance  Estimation  .  MDE 

Mean  CVM  Distance  .  MCVM 

Standard  Deviation  of  CVM  Distance  ...  SDCVM 
95%  Confidence  Interval 

upper  and  lower  limits  . .  95%  C.I. 


Monte  Carlo  sample  size  is  1000  and  true  values  of  A 
and  B  are  2  and  10  for  all  tables. 
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TABLE  B-I 


CVM  Distance  Statistics 

Sample  size  4 
True  P  3 
True  Q  3 


MCVJC 

SMYM 

££% 

KKE 

0.135119 

0.131025 

0.126998  —  0.143240 

KLE 

0.139685 

0.136396 

0.131231 — 0.148139 

HDE 

0.161920 

0.163691 

0.151774  —  0.172066 

TABLE  B-II 

CVX  Distance  Statistics 

Sample  size  4 

Trne  P  9 

Trne  Q  4 


MCYM 

SDCVS 

ili  S.I. 

KKE 

0.138114 

0.132926 

0.129875  —  0.146353 

KLE 

0.143925 

0.140191 

0.135236 — 0.152614 

KDE 

0.161689 

0.162098 

0.151642—0.171736 

TABLE  B-III 

CVX  Distance  Statistics 

Sample 
True  P 
True  Q 


size 


4 

1 

2 


MCYM 

SBCYM 

££&  £u.Ii. 

tmm 

KKE 

0.140421 

0.142939 

0.131562—0.149280 

KLE 

0.143407 

0.147824 

0.134245—0.152569 

KDE 

0.164817 

0.168862 

0.154351 — 0.175283 

i 
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TABLE  B-IV 


CVM  Distance  Statistics 

Sample  size  S 
True  P  3 
True  Q  3 


MPVM 

SDCVM 

95%  C. I. 

MME 

0.128062 

0.144414 

0.119111 — 0.137023 

MLE 

0.129722 

0.148687 

0.120506—0.138938 

MDE 

0.144501 

0.164251 

0.134321—0.154681 

TABLE  B-V 

CVM  Distance  Statistics 


Sample 
True  P 
True  Cl 

size 

8 

9 

4 

MCYM 

S&£M 

115  C,I, 

MME 

0.122436 

0.128460 

0.114474—0.130398 

MLE 

0.125101 

0.132076 

0.116915—0.133287 

MDE 

0.140132 

0.146378 

0.131059—0.149205 

TABLE  B-V I 

CVM  Distance  Statistics 

Sample  size  8 

True  P  1 

True  Q  2 


MCVM 

SDCVM 

115  <LOa. 

MME 

0.132165 

0.140954 

0.123429—0.140901 

MLE 

0.129635 

0.144304 

0.120691—0.138579 

MDE 

0.147750 

0.164972 

0.137525—0.157975 
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TABLE  B-VII 


TABLE  B-IX 

CVM  Distance  Statistics 

Saaple  size  12 

True  P  1 

True  Q  2 


MCVM 

SDCYM 

21&  £-JL. 

IMM 

MMB 

0.127104 

0.14171 9 

0.118320- 

-0.135888 

MLB 

0.119488 

0.1348(5 

0.111129- 

-0.127847 

MDE 

0.141441 

0.1(8347 

0.131007- 

-0.151875 
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TABLE  B-X 


CVM  Distance  Statistics 

Sample  size  16 
True  P  3 
Trne  Q  3 


1 

M£VM 

SPCVM 

m  &jl. 

1 

MME 

0.114084 

0.132344 

0.105881- 

-0.122287 

1 

NLE 

0.115781 

0.129284 

0.107768- 

-0.123794 

1 

MDE 

0.131064 

0.154444 

0.121491- 

-0.140637 

TABLE  B-XI 

CVM 

Distance  Statistics 

Sample  size 

16 

True 

P 

9 

True 

Q 

4 

mm 

SS3SSI 

mmmxmmmmmmmmmmm 

BeccsascE&KSscasmti&zsttmEC 

*8se«  mmmmst 

1 

M£VM 

SBC.YM 

iii  £ 

1 

1 

MNE 

0.114048 

0.132149 

0.105857- 

-0.122239 

1 

1 

MLE 

0.117757 

0.135425 

0.109363- 

-0.126151 

1 

1 

MDE 

0.130373 

0.149206 

0.121125- 

-0.139621 

1 

TABLE  B-X1I 

CVM 

Distance  Statistics 

Sample  size 

16 

True 

P 

1 

True 

Q 

2 

T 

HCYM 

SR£YM 

ili  SL-Li- 

“7 

i 

MME 

0.131006 

0.149941 

0.121713- 

-0.140299 

i 

i 

MLE 

0.123323 

0.143858 

0.114407- 

-0.132239 

i 

i 

MDE 

0.141448 

0.168164 

0.131025- 

-0.151871 

i 
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TABLE  B-XIII 


CVM  Distance  Statistics 

Sample  size  20 
True  P  3 
True  Q  3 


HcyM 

SDCVM 

95%  C.I. 

MME 

0.109630 

0.124946 

0.101886  —  0.117374 

MLE 

0.112547 

0.125415 

0.104770  —  0.120316 

MDE 

0.128660 

0.147926 

0.119491—0.137829 

TABLE  B-XIV 


CVM  Distance  Statistics 


Sample  size 

20 

True 

P 

9 

True 

Q 

4 

mm 

BSSSSSSSSXSSSSSB 

mmmmmmmmmmmmmrnmmmmmmmm 

as 

1 

M£yj 

SBCVM 

91* 

1 

1 

MME 

0.119805 

0.135413 

0.111412—0.128198 

1 

1 

MLE 

0.124453 

0.137251 

0.115946 — 0.132960 

1 

1 

MDE 

0.137788 

0.154729 

0.128198  —  0.147378 

1 

TABLE  B-XV 

CVM  Distance  Statistics 

Sample  size  20 

True  P  1 

True  Q  2 


MCVM 

SBSYM 

SLJL. 

MME 

0.122389 

0.135526 

0.113989- 

-0.130789 

MLE 

0.116783 

0.130674 

0.108684- 

-0.124882 

MDE 

0.131730 

0.148973 

0.122497- 

-0.140963 
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PROGRAM  BETAGEN 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


WRITTEN  BY  2LT  DAVID  E.  BERTRAND  AFIT/GOR-81D  FOR  MS  THESIS 
DECEMBER  1981 


PURPOSE: 


VARIABLES: 


GENERATION  OF  SAMPLES  OF  BETA  RANDOM  VARIATES 
CALCULATION  OF  MEAN  AND  STANDARD  DEVIATION  OF  SAMPLES 
CALCULATION  OF  ESTIMATES  OF  A  AND  B  BY  INTERPOLATION 


DSEED  - 

P 

Q 

A 

B 

NR 

NREPS  - 
GGBTR  - 

VSRTA  - 

SUM  - 
X 

MEAN  - 
SD 
Y1 
Y2 

YN1  - 
YN 

SLOPE  - 
ESTA  - 
ESIB  - 


I/O  FILES:  INPUT 
TAPE5 


SEED  FOR  RANDOM  NUMBER  GENERATOR 

FIRST  SHAPE  PARAMETER  OF  TRUE  DISTRIBUTION 

SECOND  SHAPE  PARAMETER  OF  TRUE  DISTRIBUTION 

LOWER  LIMIT  OF  TRUE  DISTRIBUTION 

UPPER  LIMIT  OF  TRUE  DISTRIBUTION 

DESIRED  SAMPLE  SIZE 

NUMBER  OF  SAMPLES  TO  BE  GENERATED 

IMSL  ROUTINE  WHICH  GENERATES 

STD  BETA  VARIATES 

IMSL  ROUTINE  WHICH  SORTS  AN  ARRAY 

INTO  ASCENDING  ORDER 

DUMMY  VARIABLE  USED  IN  FINDING  MEAN,  STD  DEV 
ARRAY  CONTAINING  SAMPLE  POINTS 
ARITHMATIC  MEAN  OF  SAMPLE 
STANDARD  DEVIATION  OF  SAMPLE  (BIASED) 

MEDIAN  RANK  OF  FIRST  ORDER  STATISTIC 
MEDIAN  RANK  OF  SECOND  ORDER  STATISTIC 
MEDIAN  RANK  OF  N-1ST  ORDER  STATISTIC 
MEDIAN  RANK  OF  NTH  ORDER  STATISTIC 
SEE  EQUATION  3.6  IN  THESIS 
INTERPOLATED  ESTIMATE  OF  A:  SEE  EON  3.7 
INTERPOLATED  ESTIMATE  OF  B:  SEEE  EON  3.8 

UNFORMATTED  INPUT  OF  TRUE  PARAMETER  VALUES 
OUTPUT  OF  TRUE  PARAMETERS,  SAMPLES, 
CALCULATED  VALUES 


IMPORTANT:  IMSL  LIBRARY  MUST  BE  ATTACHED  BEFORE  PROGRAM  IS  RUN 
REVIEW  IMSL  MANUAL  ON  GGBTR  AND  VSRTA  BEFORE  RUNNING 
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n  n  n  non 


EXTERNAL  GGBTR,VSRTA 
DOUBLE  PRECISION  DSEED 
DIMENSION  X(50) 

REAL  P,Q,MEAN 
INTEGER  NR 

DSEED-1859217525.D0 

C  READ  PARAMETERS  AND  WRITE  THEM  TO  FILE 
READ*,  P,Q,A,B,NR,NREPS 
WRITE(5,100)  NREPS.NR, P,Q,  A,B 

100  FORMAT(I4/I3/4(F10.6/)) 

C  *****  BEGIN  LOOP  FOR  GENERATION  OF  SAMPLES  •*’ 
DO  999  J-l.NREPS 
WRITE(5,103)  J 
103  FORMATCI4) 

C  GENERATE  AND  SORT  SAMPLE  FROM  STANDARD  BETA 

CALL  GGBTR(DSEED,  P,  Q,  NR,  X) 

CALL  VSRTA(X.NR) 

UNSTANDARDIZE  RANDOM  VARIATES, 

WRITE  DATA  TO  FILE 
CALCULATE  MEAN 
DO  10  1=1, NR 

X(I)=(B-A)*X(I)+A 
WRITE(5,101)  X(I) 

101  FORMAT (F10. 6) 

SUM=SUM+X(I) 

10  CONTINUE 
MEAN" SUM/ NR 

CALCULATE  STANDARD  DEVIATION 
SUM-0.0 
DO  20  1=1, NR 

SUM-SUM* (X( I) -MEAN) *(X(I)-MEAN) 

20  CONTINUE 

SD=( SUM/NR) **0.5 
CALCULATE  MEDIAN  RANKS 
INTERPOLATE  TO  ESTIMATE  A  AND  B 
Yl=(l.-0.3)/ (NR+0.4) 

Y2=(2. -0.3) /(NR+0.4) 

SL0PE=(Y2-Y1) / (X(2)-X( 1) ) 

C  EQUATION  3.7: 

ESTA=X(1)-Y1/ SLOPE 
YN1=(NR— 1-0.3)/ (NR+0.4) 

YN=(NR-0. 3) /(NR+0.4) 

SLOPE-(YN-YNl)/ (X(NR)-X(NR-l) ) 

C  EQUATION  3.8: 

ESTB-(l-YN)  /SLOPE+KNR) 

C  WRITE  CALCULATED  VALUES  TO  FILE 

WRITE(3,102)  ESTA,ESTB,MEAN,SD 

102  FQRMAT(F10.6/F10.6/F10.6/F10.6) 
c  *••»*  end  loop  ***** 

999  CONTINUE 
STOP 
END 
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APPENDIX  D 
COMPUTER  LISTING  OF  PROGRAM  MOMENTS 
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PROGRAM  MOMENTS 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


WRITTEN  BY  2LT  DAVID  E.  BERTRAND  AFIT/GOR-81D  FOR  MS  TEES  IS 

DECEMBER  1981 

PURPOSE:  TWO  AND  FOUR  MOMENT  ESTIMATION  OF 

THE  BETA  DISTRIBUTION 

VARIABLES:  NREPS  -  #  SAMPLES  FOR  WHICH  ESTIMATION  DONE  (INPUT) 

N  -  SAMPLE  SIZE  (INPUT) 

PR  -  TRUE  VALUE  OF  FIRST  SHAPE  PARAMETER  (INPUT) 

OR  -  TRUE  VALUE  OF  SECOND  SHAPE  PARAMETER  (INPUT) 

AR  -  TRUE  LOWER  LIMIT  OF  DISTRIBUTION  (INPUT) 

BR  -  TRUE  UPPER  LIMIT  OF  DISTRIBUTION  (INPUT) 

NDIV  -  NUMBER  OF  TIMES  4  MOMENT  ESTIMATION  FAILS 
i  -  SAMPLE  INDEX  (INPUT) 

X  -  ARRAY  OF  SAMPLE  POINTS  (INPUT) 

P  -  ESTIMATE  OF  FIRST  SHAPE  PARAMETER 
Q  -  ESTIMATE  OF  SECOND  SHAPE  PARAMETER 
A  -  ESTIMATE  OF  LOWER  LIMIT  (INITIAL  VALUE  INPUT) 

B  -  ESTIMATE  OF  UPPER  LIMIT  (INITIAL  VALUE  INPUT) 

MEAN  -  ARITHMATIC  MEAN  OF  SAMPLE  (INPUT) 

SD  -  STANDARD  DEVIATION  OF  SAMPLE  (INPUT) 

T#  -  SUM  OF  (X(I)**#),  #~1,..,4 
Y  -  STANDARDIZED  MEAN 

Z  -  STANDARDIZED  SAMPLE  DEVIATION 

ZSCNT  -  IMSL  ROUTINE  WHICH  SOLVES  SIMULTANEOUS 
NONLINEAR  EQUATIONS  BY  THE  SECANT  METHOD 
NPAR  -  #  PARAMETERS  SOLVED  FOR  BY  ZSCNT  (  -  #  EONS) 
NSIG  -  #  SIGNIFICANT  DIGITS  ZSCNT  TO  SOLVE  FOR 
ITMAX  -  MAXIMUM  #  ITERATIONS  ZSCNT  ALLOWED 
PAR  -  ARRAY  OF  PARAMETERS  INPUTED  BY  ZSCNT 
C  -  ARRAY  OF  CONSTANTS  INPUTED  BY  ZSCNT 

CONTAINS  SAMPLE  SKEWNESS,  SAMPLE  KURTOSIS 
FCN  -  SUBROUTINE  CONTAINING  EQUATIONS  TO  BE  SOLVED 
FNORM  -  ZSCNT  PARAMETER  (  SEE  IMSL  MANUAL  ) 

W  -  ZSCNT  WORKSPACE  (  SEE  IMSL  MANUAL  ) 

IER  -  ZSCNT  GENERATED  ERROR  INDICATOR 
(  SEE  IMSL  MANUAL  ) 

I/O  FILES:  TAPE5  -  INPUT,  CONTAINS  TRUE  PARAMETERS  AND  RANDOM 

SAMPLES  WITH  ESTIMATED  A  +  B,  MEAN,  STD  DEV. 
TAPES  -  OUTPUT,  CONTAINS  TRUE  PARAMETERS  AND  4-MOMENT 
PARAMETER  ESTIMATES  FOR  EACH  SAMPLE 
TAPE7  -  OUTPUT,  CONTAINS  TRUE  PARAMETERS,  AND  2-MOMENT 
ESTIMATES  OF  P+Q.  INTERPOLATED  ESTIMATES  OF 
A+B  FOR  EACH  SAMPLE 

OUTPUT-  CONTAINS  MESSAGE  ON  #  OF  TIMES  4-MOMENT 
ESTIMATION  FAILED 
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a  a  a  a  n  o  an  an  a  a  a  a  a 


IMPORTANT:  IMSL  LIBRARY  MUST  BE  ATTACHED  BEFORE  PROGRAM  IS  RUN 
REVIEW  IMSL  MANUAL  ON  ZSCNT  BEFORE  RUNNING 


EXTERNAL  ZSCNT,  FCN 

DIMENSION  X(50),PAR(2),C(2),W(42),F(2) 

REAL  MEAN 

READ  IN  TRUE  PARAMETERS  AND 
WRITE  THEM  TO  FILE 
READ(5,100)  NREPS,N,PR,QR,AR,BR 

100  F0RMAT(I4/I3/4(F10.6/)) 

WRITE(6,106)  NREPS 
WRITE(7,106)  NREPS 
WRITE(6»101)  N 
WRITE(7,101)  N 

101  F0RMAT(I3) 

WRITE(6 ,102)  PR,QR,AR,BR 
WRITE(7,102)  PR.QR.AR.BR 

102  FORMAT(4(F10.6,3X)/) 

INITIALIZE  DIVERGENCE  COUNTER 
NDIV=0 

*****  BEGIN  LOOP  FOR  NREPS  SAMPLES  ***** 

DO  999  J-l, NREPS 
READ(5 .106)  K 
106  FORMAKI4) 

INPUT  SAMPLE  POINTS 

AND  CALCULATE  SUMS 

T2-0.0 

T3-O.0 

T4-0.0 

DO  1  I-l.N 

READ(5,103)  X(I) 

103  FORMAT(FIO.S) 

I2«T2+X(I)**2 

T3«T3+X(I)**3 

T4-T4+X(I)**4 

1  CONTINUE 

INPUT  CALCULATED  VALUES 
READ(5.104)  A.B.MEAN.SD 

104  FORMAKF10.6/F10.6/F10.6/F10.6) 

T1-MEAN*N 

FIND  2-MOMENT  ESTIMATES  OF  P.Q 
Y“(MEAN-A) / (B-A) 

Z-SD/ (B-A) 

P«(Y*Y*(1-Y))/(Z*Z)  -T 
Q“((1-Y)*(1-Y)*Y)/ (Z*Z)  -(1-Y) 

WRITE  SAMPLE  INDEX.  2-MOMENT  ESTIMATES  OF 
P+Q,  AND  INTERPOLATED  ESTIMATES  OF  A+B  TO  FILE 
WRITE(7,106)  X 
WRITE(7,102)  P.Q 
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C  SET  PARAMETERS  FOR  ZSCNT 

NPAR-2 
NSIG-3 
ITMAX=100 

C  SOLVE  3RD  AND  4TH  MOMENT  EONS  FOR  P.Q 

PAR(1)-P 
PAR(2)»Q 

C  CALCULATE  SAMPLE  SKEWNESS  +  KURTOSIS 

C  USING  EONS  3.15  AND  3.16  IN  THESIS 

C(l)=(N*N*T3-3*N*T2*Tl+2*Tl**3)/(SD**3*N*(N-l)*(N-2)) 
C(2)»(  (N**3+N*N)  *T4-4*(N*N+N)  *T3*T1-3*(N*N-N)  *T2**2 

+12*N*T2*T1**2-6*T1**4 ) / ( SD**4*N* (N-l ) * (N-2 ) * (N-3 ) ) 
CALL  ZSCNT(FCN,NSIG,NPAR, UMAX, C.PAR, FNORM ,  W ,  IER) 

C  TEST  FOR  FEASIBILITY  OF  ESTIMATES 

IF(  PAR(1).GT.0.0  .AND  PAR(2).GT.0.0)  THEN 
C  IF  ESTIMATES  ARE  FEASIBLE,  SET  EQUAL  TO  P+Q, 

C  AND  FIND  A+B  USING  FIRST  TWO  MOMENTS 

P-PAR(l) 

Q-PAR(2) 

A=MEAN-SD*(P*(P+Q+1)/Q)  **0.5 
B= ( MEAN* ( P+Q)  -A*Q)  /  P 

C  IF  ESTIMATES  OF  A+B  ARE  INSIDE  1ST  AND  LAST 

C  ORDER  STATISTICS,  USE  ORDER  STATISTICS  AS  ESTIMATES 

A-MIN(A.X(1)) 

B=MAX(B,X(N) ) 

ELSE 

C  IF  ESTIMATES  ARE  INFEASIBLE,  USE  2-MOMENT  ESTIMATES 

C  (  DON'T  CHANGE  VALUE  OF  P.Q)  AND  ADD  1  TO  COUNTER 

NDIV-NDIV+1 
END  IF 

C  WRITE  SAMPLE  INDEX  AND  4-MOMENT  ESTIMATES 

C  OF  A.B.P.Q  TO  FILE 

WRITE(6.106)  X 
WRITE(6,102)  P.Q.A.B 

c  *****  END  LOOP  ***** 

999  CONTINUE 

C  PRINT  OUT  #  TIMES  4-MOMENT  ESTIMATION  FAILED 

PRINT*.  '  NUMBER  OF  TIMES  DID  NOT  CONVERGE  -'.NDTV 

STOP 

END 
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SUBROUTINE  FCN(PAR,F,NP»C) 


C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 

DIMENSION  PAR(NP),F(NP),C(2) 

C  TEST  FOR  FEASIBILITY 

IF  (PAR(l) .LE.0.0  .OR.  PAR(2).LE.0.0)  THEN 
C  SET  EQUATION  VALUES  TO  ZERO 

F(1)«0.0 
F(2)*0 .0 
ELSE 

C  CHANGE  TO  SHORTER  NOTATION 

PX=PAR<1) 

QX-PARU) 

C  EVALUATE  EQUATIONS 

F ( 1 ) -2 . • ( QX-PX) * ( ( PX+QX+1 ) * *0 . 5 ) / ( ( PX+QX+2 ) * ( ( PX*QX) **0 . 5 ) ) -C ( 1 ) 
F(2)«3 . •(PX+QX+1) •<2*<FX+QX) ••2+PX*QX*(PX+QX-6) ) 

+  /  (PX*QX*(PX+QX+2)*(PX+QX+3))-C(2) 

ENDIF 

RETURN 

END 


PURPOSE:  EVALUATE  EQUATIONS  WHICH  ZSCNT  IS  TRYING  TO  SOLVE 

VARIABLES:  PAR.C  -  SEE  MAIN  PROGRAM 

F  -  ARRAY  OF  EQUATION  VALUES  AT  THIS  P.Q 

1ST  EON  IS  DIFFERENCE  BETWEEN  POP.  +  SAMPLE 
SKEWNESS,  2ND  EON  IS  DIFFERENCE  BETWEEN 
POP.  +  SAMPLE  KURTOSIS 

NP  -  NUMBER  OF  PARAMETERS,  ALSO  #  OF  EQUATIONS 
PX  -  SHORT  NOTATION  FOR  PAR(l) 

QX  -  SHORT  NOTATION  FOR  PAR(2) 
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PROGRAM  MDCVM 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 
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WRITTEN  BY  2LT  DAVID  E.  BERTRAND  AFIT/GOR-81D  FOR  MS  THESIS 

DECEMBER  1981 

PURPOSE:  MINIMUM  DISTANCE  ESTIMATION  OF  THE  FOUR  PARAMETERS 

OF  THE  BETA  DISTRIBUTION 

VARIABLES:  NREPS  -  #  SAMPLES  FOR  WHICH  ESTIMATION  DONE  (INPUT) 

N  -  SAMPLE  SIZE  (INPUT) 

PR  -  TRUE  VALUE  OF  FIRST  SHAPE  PARAMETER  (INPUT) 
OR  -  TRUE  VALUE  OF  SECOND  SHAPE  PARAMETER  (INPUT) 

AR  -  TRUE  LOWER  LIMIT  OF  DISTRIBUTION  (INPUT) 

BR  -  TRUE  UPPER  LIMIT  OF  DISTRIBUTION  (INPUT) 

E  -  SAMPLE  INDEX  (INPUT) 

X  -  ARRAY  OF  SAMPLE  POINTS  (INPUT) 

P  -  ESTIMATE  OF  FIRST  SHAPE  PARAMETER 

Q  -  ESTIMATE  OF  SECOND  SHAPE  PARAMETER 

A  -  ESTIMATE  OF  LOWER  LIMIT(INITIAL  VALUE  INPUT) 

B  -  ESTIMATE  OF  UPPER  LIMIT  (INITIAL  VALUE  INPUT) 

MEAN  -  ARITHMATIC  MEAN  OF  SAMPLE  (INPUT) 

SD  -  STANDARD  DEVIATION  OF  SAMPLE  (INPUT) 

Y  -  STANDARDIZED  MEAN 

Z  -  STANDARDIZED  STANDARD  DEVIATION 

ZXMIN  -  IMSL  ROUTINE  USED  TO  MINIMIZE  DISTANCE 
NPAR  -  NUMBER  OF  VARIABLES  DEPUTED  BY  ZXMIN 
NSIG  -  #  SIGNIFICANT  DIGITS  ZXMIN  TO  SOLVE  FOR 
MAXFN  -  MAXI  RUN  #  FUNCTIONAL  EVALUATIONS  BY  ZXMIN 
IOPT  -  ZXMIN  INPUT  OPTION  (SEE  IMSL  MANUAL) 

PAR  -  ARRAY  OF  PARAMETER  VALUES  USED  BY  ZXMIN 
H.G.W  -  ARRAYS  USED  BY  ZXMIN  (SEE  IMSL  MANUAL) 

DISTPQ  -  SUBROUTINE  TO  FIND  DISTANCE.  P.Q  INPUT 
DISTAB  -  SUBROUTINE  TO  FIND  DISTANCE.  A.B  INPUT 
F  -  DISTANCE  VALUE:  SEE  SUBROUTINE 

IER  -  ZXMIN  GENERATED  ERROR  MESSAGE 

(  SEE  IMSL  MANUAL  ) 

I/O  FILES:  TAPES  -  INPUT.  CONTAINS  TRUE  PARAMETERS  AND  RANDOM 
SAMPLES  WITH  EST.  A  +  B.  MEAN,  STD  DEV. 

TAPES  -  OUTPUT,  CONTAINS  TRUE  PARAMETERS  AND 
PARAMETER  ESTIMATES  FOR  EACH  SAMPLE 

! 

IMPORTANT:  IMSL  LIBRARY  MUST  BE  ATTACHED  BEFORE  PROGRAM  IS  RUN 

REVIEW  IMSL  MANUAL  ON  ZXMIN  AND  MDBETA  BEFORE  RUNNING' 
(MDBETA  USED  IN  SUBROUTINE) 


COMMON  P.Q,A.B,X(S0).N 

EXTERNAL  ZXMIN,  MDBETA, DISTPQ, DISTAB 

DIMENSION  PAR(2),B(3),G(2),W(6) 

REAL  MEAN 
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C  INPUT  TRUE  PARAMETERS 

READ(S.IOO)  NREPS, N, PR. QR.AR.BR 

100  F0RMAT(I4/I3/4(F10.6/)) 

WRITE(6,106)  NREPS 
WRITE(6,101)  N 

101  F0RMAT(I3) 

WRITE(6,102)  PS.QR.AR.BR 

102  FORMAT(4(F10.6)/) 

102  FORMAT(F10.6,3X)/) 

C  BEGIN  LOOP  FOR  NREPS  SAMPLES  ••••• 

DO  99  J«1  .NREPS 

C  INPUT  SAMPLE  INDEX 

READ(5,106)  X 
106  FORMAT(I4) 

C  INPUT  SAMPLE  POINTS 

DO  1  1*1, N 

READ (5, 103)  X(I) 

103  FORMAT(F10.6) 

1  CONTINUE 

C  INPUT  CALCULATED  VALUES 

READ(5,104)  A.B.MEAN.SD 

104  FORMAT(F10.6/F10.6/F10.6/F10.6) 

C  CALCULATE  2-MOMENT  ESTIMATES  OF  P.Q 

Y*(MEAN-A) / (B-A) 

Z*SD/ (B— A) 

P-<Y*Y*<1-Y))/(Z*Z)-Y 

Q*( (1-T) *(1-T) *Y) / (Z*Z)-(1-Y) 

C  SET  ZXMIN  PARAMETERS 

NPAR-2 
NSIG-3 
MAXFN-500 
I0PT-0 

C  MINIMIZE  DISTANCE  FOR  P.Q 

PAR(1)-P 
PAR(2)*Q 

CALL  ZXM  IN ( DI STPQ, NPAR.NSlG,  MAXFN,  IOPT.PAR.B.G.F, V,  IER) 
P-PAR(l) 

Q-PAR(2) 

C  MINIMIZE  DISTANCE  FOR  A.B 

PAR(1)*A 
PAR(2)*B 

CALL  ZXMIN(DISTAB,NPAR,NSIG,MAXFN, IOPT.PAR.H.G.F.W.IER) 
A-PAR(l) 

B*PAR(2) 


C  WRITE  SAMPLE  INDEX,  ESTIMATES  TO  FILE 

WRITE(6,105)  J 
1 05  FORMATU4) 

WRITE(6,102)  P.Q, A.B 
C  •***•  END  LOOP  ••••• 


999  CONTINUE 
STOP 
END 
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n  n 


SUBROUTINE  DISTPQ(NP.PAR.F) 


c 

c 

c 

* 

•  PURPOSE:  FIND  DISTANCE  BETWEEN  ESTIMATED  CDF  AND  1/N  EDF 

***• 

c 

*  FOR  VARIABLE  P,Q  KEEPING  A,B  FIXED 

t 

c 

•  VARIABLES:  NP 

-  NUMBER  OF  PARAMETERS:  ALWAYS  2 

c 

*  PAR 

-  VECTOR  OF  PARAMETER  VALUES  P,Q 

c 

*  Y 

-  STANDARDIZED  SAMPLE  POINT 

c 

*  MDBETA 

-  IMSL  ROUTINE  WHICH  EVALUATES  BETA  CDF 

c 

•  Z 

-  VALUE  OF  BETA  CDF  AT  POINT  Y 

c 

•  IER 

-  MDBETA  GENERATED  ERROR  MESSAGE 

c 

* 

(  SEE  IMSL  MANUAL) 

c 

*  SUM 

-  DUMMY  VARIABLE  USED  TO  ADD  UP  DISTANCE 

c 

•  F 

-  DISTANCE  VALUE  AT  THIS  P,Q,A,B 

c 

•  P,Q,A, 

c 

c 

c 

•  B,X,N 

• 

-  SEE  MAIN  PROGRAM 

*•** 

COMMON  P,Q,A,B.X(50),N 
INTEGER  NP 

REAL  PAR  { NP)  ,F,  Y,Z,  SUM 

SUM=0 .0 
DO  92  I-l.N 

C  STANDARDIZE  SAMPLE  POINT 

Y«(X(I)-A)/(B-A) 

C  EVALUATE  CDF 

CALL  MDBETA  ( Y ,  PAR  ( 1 ) ,  PAR( 2)  ,Z,  IER) 
ADD  TO  SUM  FOR  DISTANCE 
SEE  EON  3  .25 

SUM-SUM+(Z- (2*1-1 . )  /  (2 .  *N) )  **2 
92  CONTINUE 
C  SET  F  EQUAL  DISTANCE 
F-SUM 
RETURN 
END 
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on  o  n  o  o 


SUBROUTINE  DISTAB(NP.PAR.F) 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

€ 

C 


PURPOSE:  FIND  DISTANCE  BETWEEN  ESTIMATED  CDF  AND  1/N  EDF 

FOR  VARIABLE  A,B  KEEPING  P.Q  FIXED 

VARIABLES:  NP  -  NUMBER  OF  PARAMETERS:  ALWAYS  2 

PAR  -  VECTOR  OF  PARAMETER  VALUES  A,B 

Y  -  STANDARDIZED  SAMPLE  POINT 

MDBETA  -  IMSL  ROUTINE  WHICH  EVALUATES  BETA  CDF 
Z  -  VALUE  OF  BETA  CDF  AT  POINT  Y 

IER  -  MDBETA  GENERATED  ERROR  MESSAGE 

(  SEE  IMSL  MANUAL) 

SUM  -  DUMMY  VARIABLE  USED  TO  ADD  UP  DISTANCE 

F  -  DISTANCE  VALUE  AT  THIS  P,Q,A»B 

P.Q.A. 

B.X.N  -  SEE  MAIN  PROGRAM 


COMMON  P,Q,  A,B,X(50)  ,N 
INTEGER  NP 

REAL  PAR(NP),F,Y,Z.SUM 

USE  ORDER  STATISTICS  IF  INTERPOLATED  VALUES 
ARE  INSIDE  1ST  AND  LAST  ORDER  STATISTICS 
IF(PAR(1)  .GT.  1(1))  PAR(1)-X(1) 

IF(PAR(2)  .LT.  I(N))  PAR(2)-X(N) 

SUM-0.0 
DO  91  I-l.N 

STANDARDIZE  SAMPLE  POINT 
Y*(X(I)-PAR(1) )/ (PAR(2)-PAR(1) ) 

EVALUATE  CDF 
CALL  MDBETA(Y,P,Q,Z,IER) 

ADD  TO  SUM  FOR  DISTANCE 

S^SUM+(Z-(2*I-1 . )  /  (2 .  *N) )  **2 
91  CONTINUE 
C  SET  F  EQUAL  DISTANCE 
F-SUM 
RETURN 
END 
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PROGRAM  EVAL 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


WRITTEN  BY  2LT  DAVID  E.  BERTRAND  AFIT/GOR-81D  FOR  MS  THESIS 

DECEMBER  1981 

PURPOSE:  EVALUATE  A  SET  OF  ESTIMATES  ON  THE  BETA  DISTRIBUTION 

-  CALCULATE  THE  MEAN  SQUARE  ERROR  OF  THE  SET  FOR 
EACH  OF  THE  FOUR  PARAMETERS 

-  FIND  THE  CVM  DISTANCE  BETWEEN  THE  ESTIMATED  AND 
TRUE  CDF.  AND  FIND  MEAN  AND  STD  DEV  OF  CVM  FOR  SET 


VARIABLES:  NREPS 

N 

PR 

OR 

AR 

BR 

X 


SUM1  - 

SEP 

SEQ 

SEA 

SEB 

P 

Q 

A 

B 

SUM 

Y 

MDBETA  - 
IER 

FN 

FR 

BETA  - 
GAMMA  - 
F 

CVM 

MSE 

MSEQ  - 
MSEA  - 
MSEB  - 
MCVM  - 
SDCVM  - 


#  OF  ESTIMATES  OF  EACH  PARAMETER  IN  SET 
(INPUT) 

SIZE  OF  SAMPLES  ON  WHICH  ESTIMATES  ARE  BASED 
(INPUT) 

TRUE  VALUE  OF  FIRST  SHAPE  PARAMETER  (INPUT) 
TRUE  VALUE  OF  SECOND  SHAPE  PARAMETER  (INPUT) 
TRUE  VALUE  OF  LOWER  LIMIT  (INPUT) 

TRUE  VALUE  OF  UPPER  LIMIT  (INPUT) 

ARRAY  CONTAINING  GAUSSIAN  QUADRATURE  POINTS 
(INPUT) 

ARRAY  CONTAINING  GAUSSIAN  QUADRATURE  WEIGHTS 
(INPUT) 

DUMMY  VAR.  USED  TO  SUM  CVM  STATS  OF  EACH 
REPITITION 

SQUARED  ERROR  OF  P  IN  THIS  REPITITION 
SQUARED  ERROR  OF  Q  IN  THIS  REPITITION 
SQUARED  ERROR  OF  A  IN  THIS  REPITITION 
SQUARED  ERROR  OF  B  IN  THIS  REPITITION 
ESTIMATE  OF  FIRST  SHAPE  PARAMETER  (INPUT) 
ESTIMATE  OF  SECOND  SHAPE  PARAMETER  (INPUT) 
ESTIMATE  OF  LOWER  LIMIT  (INPUT) 

ESTIMATE  OF  UPPER  LIMIT  (INPUT) 

DUMMY  VAR  FOR  EVAL  OF  INTEGRAL  BY  QUADRATURE 
STANDARDIZED  QUADRATURE  POINT 
IMSL  ROUTINE  WHICH  EVALUATES  STD  BETA  CDF 
MDBETA  GENERATED  ERROR  INDICATOR 
(  SEE  IMSL  MANUAL  ) 

VALUE  OF  ESTIMATED  CDF  AT  QUADRATURE  POINT 
VALUE  OF  TRUE  CDF  AT  QUADRATURE  POINT 
BETA  FUNCTION  -  SEE  EON  3.?? 

IMSL  ROUTINE  WHICH  EVALUATES  THE  GAMMA  FCN 
VALUE  OF  TRUE  PDF  AT  QUADRATURE  POINT 
ARRAY  CONTAINING  CVM  DISTANCE  BETWEEN 
ESTIMATED  AND  TRUE  CDF  FOR  EACH  REPITITION 
MEAN  SQUARE  ERROR  OF  P 
MEAN  SQUARE  ERROR  OF  Q 
MEAN  SQUARE  ERROR  OF  A 
MEAN  SQUARE  ERROR  OF  B 
MEAN  OF  THE  CVM  DISTANCES 
STD  DEV  OF  THE  CVM  DISTANCES 
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c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

EXTERNAL  MDBETA,  GAMMA 

REAL  X<  8 , 2 ) ,  W  ( 8) ,  MCVM ,  MSEP,  MSEQ,  HSEA,  MSEB,  CVM  (1000) 

C  INPUT  TRUE  PARAMETER  VALUES 

READ(6 ,100)  NREPS , , N,PR,QR, AR.BR 
100  FORMAT(I4/I3/ 4(F10.6.3X)/) 

C  INPUT  QUADRATURE  POINTS  AND  WEIGHTS 

DO  1  1=1,8 

READ*,  X(I,1).W(I) 

X(I,2)=-1.*X(I,1) 

C  TRANSLATE  QUAD  PTS  TO  (AR.BR)  INTERVAL 

DO  2  J=l,2 

X(I, J)=( (BR-AR) /2) *X(I, J)+(BR+AR)/2 
2  CONTINUE 
1  CONTINUE 

C  INITIALIZE  SUMS 

SUM1-0.0 
SEP=0 .0 
SEQ-G.O 
SEA-0.0 
SEB-0.0 


C  *****  BEGIN  LOOP  FOR  NREPS  REP IT IT IONS 
DO  999  K=l. NREPS 

C  INPUT  PARAMETER  ESTIMATES 

READ(6,101)  P,Q,A,B 
101  FORMAT(/4(F10.6,3X)/) 

C  EVALUATE  CVM  INTEGRAL  BY  QUADRATURE 

SUM-0.0 
DO  888  J-1,2 
DO  777  1=1,8 

C  STANDARDIZE  QUADRATURE  POINT 

C  USING  ESTIMATED  VALUES  OF  A  +  B 

Y»(X(I,J)-A)/(B-A) 

C  RESET  STANDARDIZED  QUAD  FT  IF 

C  IT  IS  OUTSIDE  ESTIMATED  RANGE 

IF(Y.LT.O.O)  Y-0.0 
IF(Y.GT.l.O)  Y-1.0 
C  EVALUATE  EST.  BETA  CDF 

CALL  MDBETA(Y,  P,  Q,  FN,  IER) 

C  STANDARDIZE  QUADRATURE  POINT 

C  USING  TRUE  VALUES  OF  A  +  B 


I/O  FILES:  TAPE6  -  INPUT,  CONTAINS  TRUE  PARAMETER  VALUES  AND 

ESTIMATES  FOR  EACH  REPITITION 
TAPE7  -  OUTPUT,  CONTAINS  MSE'S  AND  MEAN  +  STD  DEV 
OF  CVM  DISTANCES 

INPUT  -  CONTAINS  8  POSITIVE  QUADRATURE  POINTS  AND 
WEIGHTS  FOR  16  POINT  GAUSSIAN  QUADRATURE 

IMPORTANT:  IMSL  LIBRARY  MUST  BE  ATTACHED  BEFORE  PROGRAM  IS  RUN 

REVIEW  IMSL  MANUAL  ON  MDBETA  AND  GAMMA  BEFORE  RUNNING 


T=(X(I,J)-AR)/(BR-AR) 

EVALUATE  TRUE  BETA  CDF 
CALL  MDBETA(I,PR,QR,FR,IER) 

EVALUATE  TRUE  BETA  PDF 
BETA=GAMMA(PR) *GAMMA(QR) /GAMHA(PR+QR) 
F=(1/BETA)*(X(I.J)-AR)**(PR-1)*(BR-X(I,J))**(QR-1) 
/ (BR-AR) ♦♦(PR+QR-1 ) 

ADD  TO  SUM  FOR  EVAL.  OF  INTEGRAL 
SUM-SUM+f (I) *(FN-FR) **2*F 
777  CONTINUE 
888  CONTINUE 

CALCULATE  CVM  STATISTIC 

CVM(K)=N*< (BR-AR) /2) ♦SUM 

ADD  TO  SUMS  FOR  CVM.  SQUARED  ERRORS 

SUM1=SUM1+CVM(K) 

SEP  -SEP+(PR-P)**2 
SEQ  =SEQ+(QR-Q)  **2 
SEA  ■  SEA+(AR-A)  **2 
SEB  =SEB+(BR-B)**2 
♦**♦•  END  LOOP  ***♦♦ 

999  CONTINUE 

CALCULATE  MEAN  SQUARE  ERRORS.  MEAN  CVM 

MSEP=  SEP/NREPS 

MSEQ=  SEQ/NREPS 

MSEA=  SEA/NREPS 

MSEB=  SEB/NREPS 

MCVM-SUM1/NREPS 

CALCULATE  STD  DEV  OF  CVM  STATISTICS 
SUM=0 .0 

DO  3  K=1,NREPS 

SUM=SUM+(CVM(K)-MCVM) **2 
3  CONTINUE 

SDCVM*=(SUM/NREPS)**0.5 
WRITE  RESULTS  TO  FILE 

WRITE(7 . 103 )  MSEP, MSEQ, MSEA, MSEB.MCVM. SDCVM 
103  FORMAT(4(F10.6.3X)/2(F10.6,3X)) 

STOP 

END 
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